42
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center

Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

Embed Size (px)

Citation preview

Page 1: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center

Page 2: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

2

Evolvement of Visual Features

• Low level features and histogram

• SIFT and bag-of-words models

• Sparse coding

• Super vector and Fisher vector

• Deep CNN

Page 3: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

3

Evolvement of Visual Features

• Low level features and histogram

• SIFT and bag-of-words models

• Sparse coding

• Super vector and Fisher vector

• Deep CNN

Less parameters

More parameters

Page 4: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

4

Evolvement of Visual Features

• Low level features and spatial histogram

• SIFT and bag-of-words models

• Sparse coding

• Super vector and Fisher vector

• Deep CNN

Three fundamental techniques1. histogram2. spatial gridding3. filter

have been used extensively

Page 5: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

5

Low Level Features and Spatial Pyramid

Page 6: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

Concatenating raw

pixels as 1D vector

Raw Pixels as Feature

Pictures courtesy to Face Research Lab, Antonio Torralba and Sam Roweis

Application 1: Face recognition

Application 2: Hand written digits

Tiny Image [Torralba et al 2007]: resize an image to 32x32 color thumbnail, which corresponds to a 3072 dimensional vector

Page 7: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

7

From Pixels to Histograms

Color histogram [Swain and Ballard 91] is proposed to model the distribution of colors in an image.

b

g

r

We can extend color histogram to :• Edge histogram• Shape context histogram• Local binary patterns (LBP)• Histogram of gradientsSimilar color histogram feature

Unlike raw pixel based vectors, histograms are not sensitive to• misalignment • scale transform • global rotation

Page 8: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

8

From Histogram to Spatialized Histogram

Problem of histograms: No spatial information!

Example thanks to Erik Learned-Miller

The same histogram!

Ojala et al, PAMI’02

Histograms of spatial cells Spatial pyramid matching

[Lazebnik et al

CVPR’06]

Page 9: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

9

IBM IMARS Spatial Gridding

First position in 1st and 2nd ImageCLEF Medical Imaging Classification

Task: Determine which modality a medical image belongs to.

- Images from Pubmed articles

- 31 categories (x-ray, CT, MRI, ultrasound, etc.)

Page 10: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

10

IBM IMARS Spatial Gridding

First position in 1st and 2nd ImageCLEF Medical Imaging Classification

http://www.imageclef.org/2012/medical

Page 11: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

11

Image Filters

• In addition to histogram, another group of features can be represented as “filters”. For example:

1. Harr-like filters(Viola-Jones face detection)

Widely used in fingerprint,

iris, OCR, texture and face

recognition.

2. Gabor filters(simple cells in the visual cortex can be modeled by

Gabor functions)

Page 12: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

12

SIFT Feature and Bag-of-Words Model

• Raw pixel• Histogram feature

– Color Histogram– Edge histogram

• Frequency analysis• Image filters• Texture features

– LBP

• Scene features– GIST

• Shape descriptors• Edge detection• Corner detection

1999

SIFT features and

beyond

• DoG

• Hessian detector

• Laplacian of Harris

• FAST

• ORB

• …

• SIFT

• HOG

• SURF

• DAISY

• BRIEF

• …

Classical

features

Page 13: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

13

David G. Lowe

- Distinctive image features from scale-invariant keypoints, IJCV 2004

- Object recognition from local scale-invariant features, ICCV 1999

SIFT Descriptor:

Histogram of gradient orientation

- Histogram is more robust to position than raw pixels

- Edge gradient is more distinctive than color for local patches

Concatenate histograms in spatial cells

Scale-Invariant Feature Transform (SIFT)

• Good parameters: 4 ori, 4 x 4 grid

• Soft-assignment to spatial bins

• Gaussian weighting over spatial location

• Reduce the influence of large gradient magnitudes: thresholding +normalization

David Lowe’s excellent performance tuning:

Page 14: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

14

David G. Lowe

- Distinctive image features from scale-invariant keypoints, IJCV 2004

- Object recognition from local scale-invariant features, ICCV 1999

SIFT Detector:Detect maxima and minima of difference-of-Gaussian in scale space

Post-processing: keep corner points but reject low-contrast and edge points

Scale-Invariant Feature Transform (SIFT)

• In general object recognition, we may combine multiple detectors (e.g.,

Harris, Hessian), or use dense sampling for good performance.

• Following SIFT, many research works including SURF, BRIEF, ORB,

BRISK and etc have been proposed for faster local feature extraction.

Page 15: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

15

Histogram of Local Features

And Bag-of-Words Models

Page 16: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

16

Histogram of Local Features

…..

freq

uen

cy

codewords dim = # codewords

Page 17: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

17

Histogram of Local Features + Spatial Gridding

dim = #codewords x #grids

……

Page 18: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

18

Bag of Words Models

Page 19: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

19

Bag-of-Words Representation

Object Bag of ‘words’

Computer Vision:

Text and NLP:

Slide credit: Fei-Fei Li

Page 20: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

20

Topic Models for Bag-of-Words Representation

Unsupervised classification

Sivic et al. ICCV 2005

Supervised classification

Classification+

segmentation

Fei-Fei et al. CVPR 2005

Cao and Fei-Fei. ICCV 2007

Page 21: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

21

But these models suffer from

- Loss of spatial information

- Loss of information in quantization of “visual words”

Pros and Cons of Bag of Words Models

Images differ from texts!

Better coding approach

Bag of Words Models are good in- Modeling prior knowledge- Providing intuitive interpretation

Page 22: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

22

Sparse Coding

Page 23: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

23

Sparse Coding

• Naïve histogram uses Vector Quantization as a hard assignment, while Sparse Coding provides a soft assignment.

• Sparse Coding: approximation of l0 norm (sparse solution):

• SC works better with max pooling (while traditional VQ with averages pooling)

• References: [M. Ranzato et al, CVPR’07] [J. Yang et al, CVPR09], [J. Wang et al CVPR10], [Y. Boureau et al, CVPR10]

Page 24: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

24

Sparse Coding + Spatial Pyramid

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009

Sparse coding +

spatial pyramid+

linear SVM

Page 25: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

25

Efficient Approach

Locality preserving linear coding:

1. find k nearest neighbors to the query

2. compute sparse coding with the k neighbors

Significantly faster than naïve SC, e.g., O(1000a) -> O(5a)

For further speedup, we can use LS regression to replace SC

[J. Wang et al CVPR10]

Matlab implementation (http://www.ifp.illinois.edu/~jyang29/LLC.htm )

Can be further speed up for top-k search

Page 26: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

26

Sparse Coding Are Not Necessarily Sparse

Hard quantization

s.t.

Sparsest solution! Less sparse!

Sparse coding is less sparse.

Image level representation is not sparse

after pooling.

Is the success of SC due to sparsity?

Sparse coding

Page 27: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

27

Fisher Vector and Super Vector

Page 28: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

28

Information Loss

• Coding with information loss:

VQ: Sparse coding:

• Lossless coding:

• Significant difference with a function:

SC or VQ:

Lossless coding:

a scalar!!

a function!!

Page 29: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

29

Lossless Coding as Mixture of Experts

Expert 1 Expert 2 Expert 3

Gating function

(e.g., GMM, sparse GMM,

Harmonic K-means, etc)

• Let’s look at each codeword as a “local expert”:

Page 30: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

30

Pooling Towards Image-Level Representation

Component 1 Component 2 Component 3

+

+ + +

Pooling:

Both Fisher Vector and Super Vector can be written in this form

(with different subtraction and normalization and factors)

• Fisher Vector [Perronnin et al, ECCV10]

• Supervector [X. Zhou, K. Yu, T. Zhang et al, ECCV10]

• HG [X. Zhou et al, ECCV09]

Related references:

+ +

Normalize and concatenate

Page 31: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

31

Pooling Towards Image-Level Representation

Component 1 Component 2 Component 3

+

+ + +

Pooling:

+ +

Normalize and concatenate

Big model:

The dimension becomes C (#components) x d (#fea dim)

For example, if C=1000, d=128, the final dimension is 128K

100+ times longer than that from SC or VQ!

Page 32: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

32

Very Long Vector as Feature Representation

We can generate very long image feature vector as we discussed before

The strong feature we used for ImageNet LSVRC 2010– Dense sampling: LBP + HOG, fea dim=100 (after PCA)

– GMM with 1024 components

– 4 spatial gridding (1+3x1)

– Dimension of image feature: 100 x 1024 x 4 = 0.41 M

GMM pooling

HOG

LBP

Page 33: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

33

How to solve big models?

Page 34: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

34

For Small Datasets: Use Kernel Trick!

Kernel trick:

• 10K images => Kernel matrix: 10K x 10K ~100M

• Computational complexity depends on the size of Kernel matrix

which is less than feature dimension

Learning Locally-Adaptive Decision Functions for Person Verification,

CVPR’13 (with Z. Li and S. Chang, F. Liang, T. Huang, J. Smith)

Results on LFW dataset

We tried nonlinear kernels for face verification and got good performance

Page 35: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

35

For Large Dataset: Use Stochastic Gradient Descent

• Suppose we are working on ImageNet data using 0.4 M feature vectors.

• Total training data: 1.2M x 0.4M ~ 0.5 T real values!

– Too big to load into memory

– Too many samples to use kernel tricks

• Solution: Stochastic Gradient Descent (SGD)

– Idea: estimate the gradient on a randomly picked sample

– Comparing with gradient descent:

Page 36: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

36

SGD Can Be Very Simple To Implement

A 10 line binary SVM solver by Shai Shalev-Shwartz

decreasing learning rate

Page 37: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

37

Deep CNN and Related Tech

Page 38: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

38

Deep CNN: A Bigger Model

Motivated by the studies of [Kizhevsky et al, NIPS12] [Y. LeCun et al, PIEEE98], deep convolutionary neural network (CNN) becomes the newest winner in ImageNet competition. The most popular CNN has:

– 5 convolutional layers to learn filters

– 2 fully connected layers

– 60 million parameters

– Stochastic gradient descent (again)

Why we can train such a bigger model now (not in 1990s)?

– The rise of big dataset (ImageNet)

– The bless of GPU computing

Page 39: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

Deep Learning Demo

http://smith-gpu.pok.ibm.com:8080/

Page 40: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

40

Learning Representation From Big Data

Computer vision researchers have seen big performance jump in large scale datasets like ImageNet.

Even earlier, researchers in and speech/acoustics have seen similar success in LVCSR and related tasks.

In another field, text/NLP researchers are also moving quickly to large scale learning. For example, the IBM Watson system used thousands of sub-systems to won the human players in Jeopardy! Game.

www.ibm.com/watsonjobs

Watson is hiring!

Especially, we are looking for winter interns working on vision + NLP problems. contact [email protected]

Page 41: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

41

Conclusion

Page 42: Learning Visual Semantics: Models, Massive …mp7.watson.ibm.com/LearningVisualSemantics/slides/CaoFeatures.pdf · Matlab implementation (jyang29/LLC.htm ) ... of sub-systems to won

42

Conclusion

The mutual evolvement of big data and big models:

Histogram

Sparse coding

(10K parameters)

Supervec, Fishervec

(0.4M parameters)

Deep CNN

(60M para)Bigger

Small dataset(e.g., Caltech101, 8K im)

Medium dataset(e.g., PASCAL, 10+K)

Large dataset(e.g., ImageNet 1.2M)

Bigger

Motivating questions:

- How to develop scalable solutions for big data?

- How to deal with situations with limited labeled data?

Please see the following talks for the answer!