Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Learning Deep Features for Scene Recognition using Places Database
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva NIPS2014
Bora Çelikkale
INTRODUCTION
Human Visual Recognition
Samples world several times / sec
~millions images within a year
INTRODUCTION
Primate Brain
Hierarchical organization in layers of increasing processing complexity
Inspired CNNs
PROBLEM & MOTIVATION
Obj Classification have obtained astonishing performanace with large databases (ImageNet)
Iconic images do not contain the richness and diversity of visual info in scenes
CONTRIBUTIONS
Scene-centric database 60x larger than SUN
Comparison metrics for scene datasets:Density, Diversity
SCENE DATASETS
Scene15 (Lazebnik et al. 2006)
15 categories
~3000 imgs
MIT Indoor67 (Quatham & Torralba 2009)
67 categories of indoor places
15.620 imgs
SUN (Xiao et al. 2010)
397 (well-sampled) categories
130.519 imgs
Places (Zhou et al. 2014)
476 categories
7.076.580 imgs
PLACES DATASET
Same categories from SUN
696 popular adjectives in Eng
Google Images
Bing Images
Flickr
>40M imgs are downloaded
1
PLACES DATASET
PCA-based duplicate removal across SUN
2 Places & SUN have different images
Allows to combine Places & SUN
PLACES DATASET
Annotations (with AMT)
Questions (eg: is this a living room?)
Two round setup:1. Default answer is NO2. Default answer is YES
Imgs shown / round: 750 + 60 from SUN for control
3
Take >90% accuracy
COMPARISON METRICS
Relative Density
COMPARISON METRICS
Relative Density
Images have more similar neighbors
NN of a1
NN of b1
COMPARISON METRICS
Relative Diversity
Simpson Index: two random individual belong to same specie
NN of a1 NN of b
1
EXPERIMENTS
Density & Diversity Comparison (AMT)
1 Relative diversity vs. relative density per each category and dataset
Show 12 pairs of images
Workers select the most similar pair
Diversity: pairs are chosen random for each db
Density: 5th NN (avoid near duplicates) is chosen as pair with GIST
EXPERIMENTS
Cross Dataset Generalization
2 Training and testing across different datasets
ImageNet-CNN and linear SVM
EXPERIMENTS
Comparison with Hand-designed Features
3
EXPERIMENTS
Training CNN for Scene Recognition
2,5M imgs from 205 categories, on AlexNet 4
PLACES-CNNs
Hybrid-AlexNet
Places + ImageNet 3.5M imgs, 1183 categoriesAccuracy = 0.5230 on validation set
Places205-GoogLeNet (on 205 categories)
Accuracy: top1 = 0.5567, top5 = 0.8541 on validation set
Places205-VGG16 (on 205 categories)
Accuracy: top1 = 0.5890, top5 = 0.8770 on validation set
PLACES2 DATASET
400+ unique scene categories
>10M images
AlexNet top1 accuracy: 43.0%
VGG16 top1 accuracy: 47.6%
DEMO
http://places.csail.mit.edu/demo.html
http://places2.csail.mit.edu/demo.html
THANK YOU