71
Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Embed Size (px)

Citation preview

Page 1: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Essence The of the Scene Gist

Trayambaka KarraKTand

Garold Fuks

Page 2: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

The “Gist” of a scene

If this is a street this must be a pedestrian

Page 3: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Physiological Evidence

• People are excellent in identifying pictures (Standing L., QL. Exp. Psychol. 1973)

Gist: abstract meaning of scene • Obtained within 150 ms (Biederman, 1981, Thorpe S. et.al 1996 )

• Obtained without attention (Oliva & Schyns, 1997, Wolfe,J.M. 1998)

• Possibly derived via statistics of low-level structures) e.g. Swain & Ballard, 1991(

• Change Blindness (seconds)(Simons DJ,Levin DT,Trends Cog.Sci. 97)

Page 4: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

What is the “gist”

• Inventory of the objects

(2-3 objects in 150 msec Luck & Vogel, Nature 390, 1997 )

• Relation between objects (layout) (J. Wolfe, Curr. Bio. 1998, 8 )

• Presence of other objects

• “Visual stuff” – impression of low level

features

Page 5: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

How does the “Gist” works

Statistical Properties

Object Properties

R.A. Rensink, lecture notes

Page 6: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Outline• Context Modeling

– Previous Models– Scene based Context Model

• Context Based Applications– Place Identification– Object Priming– Control of Focus of Attention– Scale Selection– Scene Classification

• Joint Local and Global Features Applications– Object Detection and Localization

• Summary

Page 7: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probabilistic Framework

)()(

)/()/( OP

vP

OvPvOP

MAP Estimator

v – image measurementsO – object property

Category (o)Location (x)Scale (σ)

Page 8: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object-Centered Object Detection

B. Moghaddam, A. Petland IEEE, PAMI-19 1997

)()(

)/()/()/( OP

vP

OvPvOPvOP

L

LL

• The only image features relevant to object detection are those belonging to the object and not the background

Page 9: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

The “Gist” of a scene

Local features can be ambiguous Context can provide prior

Page 10: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Scene Based Context Model

)/()/(

),/(),/()/( C

CL

CLCL vOP

vvP

vOvPvvOPvOP

Background provides a likelihood of finding an object

Prob(Car/image) = lowProb(Person/image) = high

Page 11: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context Modeling

Previous Context Models(Fu, Hammond and Swain, 1994,Haralick, 1983; Song et al, 2000)

Rule Based Context Model Object Based Context Model

Scene centered context representation(Oliva and Torralba, 2001,2002)

Page 12: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Structural Description

O2

O2

O4

O3

O1

O4

Above

Above

Right-of Left-of

Touch

Rule Based Context Model

Page 13: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Rule Based Context Model

Fu, Hammond and Swain, 1994

Page 14: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Based Context Model

),...,()/(),...,,,...,( 11

11 N

N

iiiNN OOPOvPvvOOP

• Context is incorporated only through prior probability of object combinations in the world

R. Harralick, IEEE, PAMI-5 1983

Page 15: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Scene Based Context Model

What are the features representing scene - ?

• Statistics of local low level features• Color histograms• Oriented band pass filters

Page 16: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context Features - Vc

'

)'()'(),(x

k xxgxIkxv

g1(x)

g2(x)

gK(x)

v(x,1)

v(x,2)

v(x,K)

Page 17: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context Features - Vc

Car , no peoplePeople, no car

Gabor filter

xfi

x

kkk eegxg ,2

0

2

2

)(

Page 18: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context Features - Vc

),(1 kx ),(2 kx

PCA

D

nnn kxakxv

1

),(),(

),(3 kx

Page 19: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context Features - Summary

x k

nn kxkxva ),(),(

Kkkxv 1,( Bank Of

Filters

DimensionReduction

PCA

I(x)

DnnC avfeaturesContext ,...,1

Page 20: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features

)/()/(

),/(),/()/( C

CL

CLCL vOP

vvP

vOvPvvOPvOP

How to obtain context based probability priors P(O/vc) on object properties - ?

• GMM - Gaussian Mixture Model • Logistic regression• Parzen window

Page 21: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features GMM

)()/()()/()(

)()(

)/()/(

OPOvPOPOvPvP

OPvP

OvPvOP

CCC

C

CC

M

iiiCiC vGwOvP

1

,;/

P(Object Property/Context)

Need to study two probabilities: P(vc/O) – likelihood of the features given the presence of an objectP(vc/¬O) – likelihood of the features given the absence of an object

Gaussian Mixture Model:

The unknown parameters are learnt by EM algorithm

Miiiiw 1,,

Page 22: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features

How to obtain context based probability priors P(O/vc) on object properties - ?

• GMM - Gaussian Mixture Model • Logistic regression• Parzen window

Page 23: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features Logistic Regression

D

iCiC ivaavF

10 )()(

)()/(

)/(log C

C

C vFvOP

vOPLogit

)(1

1/

CvFC evOp

Page 24: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features Logistic Regression

0a

20)/(

)/(log 10

ageaavOP

vOP

C

C

1a

O = having back problemsvc = age

- The log odds for 20 year old person

- The log odds ratio when comparing two persons who differ 1 year in age

Example

Training Stage

Working Stage 20101

1/

ageaaeageOp

Page 25: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features

How to obtain context based probability priors P(O/vc) on object properties - ?

• GMM - Gaussian Mixture Model • Logistic regression• Parzen window

Page 26: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Probability from Features Parzen Window

j

jCC vvKOvP /

2

2

v

ekvK

Radial Gaussian Kernel

Page 27: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

What did we have so far…

• Context Modeling

• Context Based Applications– Place Identification– Object Priming– Control of Focus of Attention– Scale Selection– Scene Classification

Page 28: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Place IdentificationGoal: Recognize specific locations

jjCjC

jjjC

jjCCj

vvKPlacevP

PlacePPlacevP

PlacePPlacevPvPlaceP

/

/

//

Page 29: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Place Identification

A.Torralba, K.Murphy, W. Freeman, M. Rubin ICCV 2003

Page 30: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Place Identification Cj

jvPlaceP /maxDecide only when

Precision vs. Recall rate:

A.Torralba, P. Sinha, MIT AIM 2001-015

Page 31: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Priming• How do we detect objects in an image?

– Search the whole image for the object model.

– What if I am searching in images where the object doesn’t exist at all?

• Obviously, wasting “my precious” computational resources. --------- GOLUM.

• Can we do better and if so, how?

– Use the “great eye”, the contextual features of the image (vC), to predict the probability of finding our object of interest, o in the image i.e. P(o / vC).

Page 32: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Priming …..

• What to do?– Use my experience to learn

from a database of images with

• How to do it?– Learn the PDF , by a mixture of

Gaussians– Also, learn the PDF

M

iiiCiC VvvGwovP

1

),;(/

)(

)()/()/(

C

CC vP

oPovPvoP

)()/()()/()( oPovPoPovPvP CCC

M

iiiCiC VvvGwovP

1

),;(/

Page 33: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Priming..…

Page 34: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Priming..…

Page 35: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Control of Focus of Attention

• How do biological visual systems use to deal with the analysis of complex real-world scenes?

– by focusing attention into image regions that require detailed analysis.

Page 36: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Modeling the Control of Focus of Attention

How to decide which regions are “more” important than others?

• Local–type methods1. Low level saliency maps – regions that have different

properties than their neighborhood are considered salient.

2. Object centered methods.

• Global-type methods1. Contextual control of focus of attention

Page 37: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Contextual Control of Focus of Attention

• Contextual control is both

– Task driven (looking for a particular object o) and

– Context driven (given global context information: vC)

• No use of object models (i.e. ignores object centered features)

Page 38: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Contextual Control of Focus of Attention …

Page 39: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Contextual Control of Focus of Attention…

• Focus on spatial regions that have high probability of containing the target object o given context information (vC)

• For each location x, lets calculate the probability of presence of the object o given the context vC.

• Evaluate the PDF based on the past experience of the system.

ContextObjectLocationP ,/

Page 40: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Contextual Control of Focus of Attention…

Miiiiii

M

iiiCi

M

iiiCiii

C

CC

C

VvXxw

VvvGw

VvvGXxxGw

ovP

ovxPvoxP

voxPeiContextObjectLocationP

1

1

1

,,,

),;(

),;(),;(

/

/,,/

?,/..,/

Learning Stage: Use the Swiss Army Knife, the EM algorithm, to estimate the parameters

Page 41: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Contextual Control of Focus of Attention…

Page 42: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Scale Selection• Scale selection is

• a fundamental problem in computer vision.

• a key bottleneck for object-centered object detection algorithms.

• Can we estimate scale in a pre-processing stage?

• Yes, using saliency measures of low-level operators across spatial scales.

• Other methods? Of course, …..

Page 43: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context-Driven Scale Selection

?,/..

,/,,/

CvoPei

ContextObjectScalePContextObjectLocationScaleP

M

iiiCi

M

iiiCiii

C

CC

VvvGw

VvvGSGw

ovP

ovPvoP

1

1

),;(

),;(),;(

/

/,,/

dvoP C,/

M

iiiCi

M

iiiCii

VvvGw

VvvGw

1

1

),;(

),;(

Preferred Scale,

Page 44: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context-Driven Scale Selection.…

Page 45: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Context-Driven Scale Selection.…

Page 46: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Scene Classification

sCClassifier

CClassifierC vsSceneP

vsScenePvsSceneP

/

//

• Strong correlation between the presence of many types of objects.

• Do not model this correlation directly. Rather, use a “common” cause, which we shall call “scene”.

• Train a Classifier to identify scenes.

•Then all we need is to calculate

Page 47: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

What did we have so far…

• Context Modeling• Context Based Applications• Joint Local and Global Features Applications

– Object Detection and Localization

Need new tools: Learning and Boosting

Page 48: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Weak Learners• Given (x1,y1),…,(xm,ym) where

• Can we extract “rules of thumb” for classification purposes?

• Weak learner finds a weak hypothesis (rule of thumb)

h : X {spam, non-spam}

},{

}{

spamnonspamYy

emailsofsetXx

i

i

Page 49: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Decision Stumps

• Consider the following simple family of component classifiers generating ±1 labels:

h(x;p) = a[xk > t] - b

where p = {a, b, k, t}. These are called decision stumps.

• Sign (h) for classification and mag (h) for a confidence measure.

• Each decision stump pays attention to only a single component of the input vector.

Page 50: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Ponders his maker, ponders his will

• Can we combine weak classifiers to produce a single strong classifier in a simple manner:

hm(x) = h(x;p1) + …. + h(x;pm)

where the predicted label for x is the sign of hm(x).

• Is it beneficial to allow some of the weak classifiers to have more “votes” than others:

hm(x) = α1h(x;p1) + …. + αmh(x;pm)

where the non-negative votes αi can be used to emphasize the components more reliable than others.

Page 51: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Boosting

What is boosting?– A general method for improving the accuracy of any

given weak learning algorithm.

– Introduced in the framework of PAC learning model.

– But, works with any weak learner (in our case the decision stumps).

Page 52: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Boosting …..

• A boosting algorithm sequentially estimates and combines classifiers by re-weighting training examples (each time concentrating on the harder examples)– each component classifier is presented with a slightly

different problem depending on the weights

• Base Algorithms– a set of “weak” binary (±1) classifiers h(x;p) such as

decision stumps

– normalized weights D1(i) on the training examples, initially set to uniform (D1(i) = 1 / m)

Page 53: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

AdaBoost

1. At the tth iteration we find a weak classifier h(x;pt) for which the classification error is better than chance.

2. The new component classifier is assigned “votes” based on its performance

3. The weights on the training examples are updated according to

where Zt is a normalization factor.

));()((5.05.01

m

itiitt pxhyiD

)/)1log((5.0 ttt

t

itittt Z

xhyiDiD

))(exp()()(1

Page 54: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

AdaBoost

Page 55: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Gambling

Uri

Gari KT

Page 56: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Detection and Localization

• 3 Families of Approaches– Parts based

• Object defined as spatial arrangement of small parts.

– Region based • Use segmentation to extract a region of image from the background

and deduce shape and texture info from its local features.

– Patch based• Use local features to classify each rectangular image region as

object or background.

• Object detection is reduced to a binary classification problem i.e compute just P(OC

i = 1 / vCi)

where OCi = 1 if patch i contains (part of) an object of class C

vCi = the feature vector for patch i

computed for class C.

Page 57: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Feature Vector for a Patch: Step 1

Page 58: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Feature Vector for a Patch: Step 2

Page 59: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Feature Vector for a Patch: Step 3

Page 60: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Summary: Feature Vector Extraction

12 * 30* 2 = 720 features

Page 61: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Filters and Spatial Templates

Page 62: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Object Detection …..

• Do I need all the features for a given object class?

• If so, what features should I extract for a given object class?

– Use training to learn which features are more important than others.

Page 63: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Classifier: Boosted Features• What is available?

– Training data is v = the features of the patch containing an object o.

• Weak learners pay attention to single features:

– ht(v) picks best feature and threshold:

• Output is

– ht(v) = output of weak classifier at round t– αt = weight assigned by boosting

• ~100 rounds of boosting

t

tt vhv )()(

)()( kvvht

Page 64: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Examples of Learned Features

Page 65: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Example Detections

Page 66: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Using the Gist for Object Localization

• Use gist to predict the possible location of the object.

• Should I run my detectors only in that region?– No! Misses detection if the object is at any other location.

– So, search everywhere but penalize those that are far from predicted locations.

• But how?

Page 67: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Using the Gist for Object Localization.…

• Construct a feature vector

which combines the output of the boosted classifier, and the difference .

• Train another classifier to compute

CCi

Ci xxvf

,,

CCi xx

Civ

.

.

classthisofobjectsforlocationpredictedx

patchtheoflocationxC

Ci

CCi

Ci

Ci xxvfOP

,,/1

Page 68: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Using the Gist for Object Localization.…

Page 69: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Summary

• Context Modeling– Previous Models– Scene based Context Model

Page 70: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Summary

• Context Modeling• Context Based Applications

– Place Identification– Object Priming– Control of Focus of Attention– Scale Selection– Scene Classification

Page 71: Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

Summary

• Context Modeling• Context Based Applications• Joint Local and Global Features Applications

– Object Detection and Localization