Learning Deep Features for Scene Recognition using Places Database · 2017-10-27 · Learning Deep...

Preview:

Citation preview

Learning Deep Features for Scene Recognition using Places Database

Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva NIPS2014

Bora Çelikkale

INTRODUCTION

Human Visual Recognition

Samples world several times / sec

~millions images within a year

INTRODUCTION

Primate Brain

Hierarchical organization in layers of increasing processing complexity

Inspired CNNs

PROBLEM & MOTIVATION

Obj Classification have obtained astonishing performanace with large databases (ImageNet)

Iconic images do not contain the richness and diversity of visual info in scenes

CONTRIBUTIONS

Scene-centric database 60x larger than SUN

Comparison metrics for scene datasets:Density, Diversity

SCENE DATASETS

Scene15 (Lazebnik et al. 2006)

15 categories

~3000 imgs

MIT Indoor67 (Quatham & Torralba 2009)

67 categories of indoor places

15.620 imgs

SUN (Xiao et al. 2010)

397 (well-sampled) categories

130.519 imgs

Places (Zhou et al. 2014)

476 categories

7.076.580 imgs

PLACES DATASET

Same categories from SUN

696 popular adjectives in Eng

Google Images

Bing Images

Flickr

>40M imgs are downloaded

1

PLACES DATASET

PCA-based duplicate removal across SUN

2 Places & SUN have different images

Allows to combine Places & SUN

PLACES DATASET

Annotations (with AMT)

Questions (eg: is this a living room?)

Two round setup:1. Default answer is NO2. Default answer is YES

Imgs shown / round: 750 + 60 from SUN for control

3

Take >90% accuracy

COMPARISON METRICS

Relative Density

COMPARISON METRICS

Relative Density

Images have more similar neighbors

NN of a1

NN of b1

COMPARISON METRICS

Relative Diversity

Simpson Index: two random individual belong to same specie

NN of a1 NN of b

1

EXPERIMENTS

Density & Diversity Comparison (AMT)

1 Relative diversity vs. relative density per each category and dataset

Show 12 pairs of images

Workers select the most similar pair

Diversity: pairs are chosen random for each db

Density: 5th NN (avoid near duplicates) is chosen as pair with GIST

EXPERIMENTS

Cross Dataset Generalization

2 Training and testing across different datasets

ImageNet-CNN and linear SVM

EXPERIMENTS

Comparison with Hand-designed Features

3

EXPERIMENTS

Training CNN for Scene Recognition

2,5M imgs from 205 categories, on AlexNet 4

PLACES-CNNs

Hybrid-AlexNet

Places + ImageNet 3.5M imgs, 1183 categoriesAccuracy = 0.5230 on validation set

Places205-GoogLeNet (on 205 categories)

Accuracy: top1 = 0.5567, top5 = 0.8541 on validation set

Places205-VGG16 (on 205 categories)

Accuracy: top1 = 0.5890, top5 = 0.8770 on validation set

PLACES2 DATASET

400+ unique scene categories

>10M images

AlexNet top1 accuracy: 43.0%

VGG16 top1 accuracy: 47.6%

DEMO

http://places.csail.mit.edu/demo.html

http://places2.csail.mit.edu/demo.html

THANK YOU

Recommended