41
Shifting from Naming to Describing: Semantic Attribute Models Rogerio Feris, June 2014

Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Shifting from Naming to Describing: Semantic Attribute Models

Rogerio Feris, June 2014

Page 2: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Recap

Training Data

Low-Level Feature Extraction

Feature Coding and Pooling

Large-Scale Semantic Modeling

Slide credit: Rogerio Feris

Page 3: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

What if no training samples are available for the target class?

Is this a practical setting?

Slide credit: Rogerio Feris

Page 4: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Motivation

ImageNet has 30 mushroom synsets, each with ≈1000 images.

Slide credit: Christoph Lampert

Page 5: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Motivation

In nature, there are ≈14,000 mushroom species.

Slide adapted from Christoph Lampert

Image: http://www.evogeneao.com/

Zero-data: Many fine-grained visual categorization tasks may have classes with few or no training examples at all.

Page 6: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Motivation

Slide credit: Rogerio Feris

Suspect Search in Surveillance Videos

[Feris et al, IBM]

Zero-data: often no example images from suspects are available, only textual descriptions.

Page 7: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Motivation

Slide credit: Rogerio Feris

Prediction of concrete nouns from neural imaging data (mind reading) [Mark Palatucci et al, NIPS 2009]

Noun Prediction

Zero-Data: many nouns without corresponding neural image examples (costly label acquisition)

Page 8: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Motivation

Slide credit: Rogerio Feris

Similar problems in other fields:

Zero-Data: Infeasible to acquire training samples for each word (need sub-word modeling like phonemes)

Large Vocabulary Speech Recognition

Zero-Data: Newly released apps without any user ratings (also known as “cold-start problem”) [Schin et al, SIGIR 2002]

Recommendation Systems

Page 9: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Semantic Attribute Models: Zero-Shot Learning for Visual

Recognition

[Lampert et al, CVPR 2009] [Farhadi et al, CVPR 2009] [Palatucci et al, NIPS 2009]

Page 10: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attribute-based Classification

Slide adapted from Christoph Lampert

Attributes:

Semantic/nameable properties that are shared across classes

Intuitive mid-level feature representation

Page 11: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attribute-based Classification

Unseen categories

Unseen categories

Semantic Attribute Classifiers

Standard multi-class

classification

Attribute-based classification

[Lampert et al, CVPR 2009]

Slide credit: Rogerio Feris

Semantic Output Code Classifier (SOCC)

[Palatucci et al, NIPS 2009]

Similar to Error-Correcting Output codes (ECOC [Dietterich & Bakiri, 1995]), but semantic codes are used instead

Page 12: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Image-Attributes Prediction

Slide credit: Rogerio Feris

For each attribute , collect a set of positive and negative samples and train a classifier (e.g., using SVM or Neural networks)

Positive (Stripe) Negative (Non-Stripe)

Binary Attribute Model

Example: “Stripe” Attribute

Attributes transcend class boundaries Learning “stripe” attribute with images of zebras, clothing, …

Page 13: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Image-Attributes Prediction

[Parikh and Grauman, ICCV 2011]

Issue with Binary Attribute Models

Smiling Not smiling ???

Natural Not natural ???

Page 14: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Image-Attributes Prediction

Max-margin learning to rank formulation of Joachims 2002

i j

i j

Relative Attributes Replace binary model by a ranking function

[Parikh and Grauman, ICCV 2011]

Page 15: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attribute-Class Associations

Manual Specification of Class-Attribute Associations

Page 16: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attribute-Class Associations

Associations may be extracted automatically from other sources

[Rohrbach et al, CVPR 2010]

Page 17: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attributes as “classes”

[Rohrbach et al, CVPR 2010] [Felix Yu et al, CVPR 2013] [Mensink et al, CVPR 2014]

Attribute-based Direct similarity

“giant pandas are similar to grizzly and polar bears”

Page 18: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Generalization: Label Embedding

[Akata et al , CVPR 2013]

Check talk by Florent Perronnin on “Output embedding for large-scale visual recognition” (LSVR CVPR 2014 tutorial)

Page 19: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Generalization: Label Embedding

Frome et al . "DeViSE: A Deep Visual-Semantic Embedding Model", NIPS 2013

Label Embedding Framework

Automatic Discovery of word associations

Label Image

Real-Value word vector representation

Skip-gram model: Semantically related words are mapped to similar vector representations

Deep Learning

Page 20: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Generalization: Label Embedding

Language Model Source Code: https://code.google.com/p/word2vec/

Zero-Shot Learning / Semantically close mistakes

Label Embedding Framework

Automatic Discovery of word associations [Frome et al, NIPS 2013]

Page 21: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

In addition to zero-shot classification,

semantic attribute models have shown to be useful for many other tasks

Page 22: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Other Uses of Semantic Attributes

Check the CVPR 2013 tutorial on Attributes: https://filebox.ece.vt.edu/~parikh/attributes/

Page 23: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attribute-based Search

Application: Smart Surveillance [Feris et al, IBM - WACV 2009, CVPR 2011, ICMR 2014]

Page 24: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Attribute-based People Search

http://www.today.com/video/today/51630165/#51630165

Page 25: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Attribute-based People Search

People Search in Surveillance Videos

Traditional Approaches: Face Recognition (“Naming”)

Face recognition is very challenging under lighting changes, pose variation, and low-resolution imagery (typical conditions in surveillance scenarios).

Attribute-based People Search (“Describing”)

Rather than relying on face recognition only, we provide a complementary people search framework based on fine-grained semantic attributes.

Query Example:

“Show me all people with a beard and sunglasses, wearing a white hat and a patterned blue shirt, from all metro cameras in the downtown area, from 2pm to 4pm last Saturday".

Page 26: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Attribute-based People Search

Suspect Description Form

Page 27: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Attribute-based People Search

System Architecture

Page 28: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

[Siddiquie et al, CVPR 2011]

Facial Attributes: bald, hair, color of hair, hat, color of hat, sunglasses, eyeglasses, absence of glasses, beard, mustache, absence of facial hair, skin tone (dark, medium,light), gender, …

Torso Attributes: clothing color, patterned, solid, …

Timestamp, Camera ID

Attribute-based People Search

Page 29: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Attribute-based People Search

Attribute Ranking [Siddiquie, Feris and Davis, CVPR 2011]

“Learning to rank”- confidence of individual attributes as features

Pairwise attribute modeling

Page 30: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Structured Learning Formulation

Improved performance over other ranking methods (RankSVM, RankBoost, DORM, TagProp) in three standard datasets (LFW, FaceTracer, PASCAL)

See [Siddiquie, Feris and Davis, CVPR 2011]

Page 31: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Attribute-based People Search

Top-1 Ranking Results [Feris et al, ICMR 2014]

Page 32: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Boston Bombing Event

“Show me all images of people matching the suspect description from time X to time Y from all cameras in area Z.”

Ability to spot a person with e.g., a white hat in a crowded scene

Suspect #1 found in 4 images in top 8 results Suspect #2 found in 3 images in top page

1071 detected faces from 50 high-res Boston images (all from Flickr)

Page 33: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Slide credit: Rogerio Feris

Extension to Vehicle Search

“Show me all blue trucks larger than 7ft length traveling at high speed northbound last Saturday, from 2pm to 5pm.”

[Feris et al, IEEE Trans on Multimedia, 2012]

Page 34: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Attribute-based Search

Application: Product Search [Kovashka et al, CVPR 2012, ICCV 2013] [Yu & Grauman, CVPR 2014]

Page 35: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Whittle Search

Slide credit: Kristen Grauman

Page 36: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Whittle Search Check Whittle Search demo at: http://godel.ece.vt.edu/whittle/

Page 37: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Resources

Page 38: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

http://rogerioferis.com/VisualRecognitionAndSearch2014/Resources.html

Resources

Page 39: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Resources

Galaxy Morphological Attributes Data available at: http://data.galaxyzoo.org/

Slide credit: Rogerio Feris

304,122 Galaxy Images 58,719,719 Annotations 83,943 volunteers 11 tasks / 38 answers (fine morphological attributes)

Page 40: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Resources

http://www.snapshotserengeti.org/

Slide credit: Rogerio Feris

5 Terabytes of annotated data Data will be made publicly available soon!

Page 41: Shifting from Naming to Describing: Semantic Attribute Modelsmp7.watson.ibm.com/LearningVisualSemantics/slides/Feris...Face recognition is very challenging under lighting changes,

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications CVPR 2014

Parts and Attributes Workshop

https://filebox.ece.vt.edu/~parikh/PnA2014/

http://rogerioferis.com/PartsAndAttributes/

http://pub.ist.ac.at/~chl/PnA2012/

(ECCV 2010)

(ECCV 2012)

(ECCV 2014)

Check the Call for Extended Abstracts (Posters) Submission deadline: June 30th, 2014