Contextual models for object detection using boosted random fields

Contextual models for object detection using boosted random fields

by Antonio Torralba,

Kevin P. Murphy and

William T. Freeman

Quick Introduction

What is this?

Now can you tell?

Belief Propagation (BP)

Network (Pairwise Markov Random Fields) observed nodes (yi)



hidden nodes (xi)



hidden nodes (xi)

Statistical dependency, called local evidence:

),( iii yx )( ii xShord-hand


Statistical dependency:Local evidence

),( iii yx )( ii xShord-hand

Statistical dependency:Compatibility function

),( jiij xx


Joint probability

)(

),()(1

})({ij

jiiji

ii xxxZ

xp


Joint probability

x

x1 x2 xi….

x5 x3 x1 x4 xjx12

y1 y2 yi

)(

),()(1

})({ij

jiiji

ii xxxZ

xp


Joint probability

x

x1 x2 xi….

x5 x3 x1 x4 xjx12

y1 y2 yi

)(

),()(1

})({ij

jiiji

ii xxxZ

xp


The belief b at a node i is represented by the local evidence of the node all the messages coming in from

neighbors

)(

)()()(iNj

ijiiiii xmxkxb xi xj

)( ii x

∏

Ni

yi


The belief b at a node i is represented by the local evidence of the node all the messages coming in from

neighbors

)(

)()()(iNj

ijiiiii xmxkxb xi xj

)( ii x

∏

)|( yxp ii

Ni

yi


Messages m between hidden nodes

How likely node j thinks it is that node i will be in the corresponding state.

xi xjmji(xi)


ijNk

jkjx

ijjijjiji xmxxxxmj \)(

)(),()()(

xi xj xk

)( jj x

),( ijji xx

xi xjmji(xi)

Conditional Random Field

Distribution of the form:

Conditional Random Field

iNj

jiiji

ii xxxZ

yxp ),()(1

)|(

Distribution of the form:

Boosted Random Field

Basic Idea:

Use BP to estimate P(x|y)

Use boosting to maximize Log Likelihood of each node wrt to )( ii x

Algorithm: BP

Minimize negative log likelihood of training data (yi). Label Loss function to minimize:

m i

mitmi

i

ti

t xbJJ )( ,,

Algorithm: BP


m i

mitmi

i

ti

t xbJJ )( ,,

m i

xtmi

xtmi

mimi bb*,

*, 1

,, )1()1(

Algorithm: BP


m i

mitmi

i

ti

t xbJJ )( ,,

m i

xtmi

xtmi

mimi bb*,

*, 1

,, )1()1(

2/)1( ,*, mimi xx

}1,1{, mix

Algorithm: BP

)(

1 )()()(iNj

it

ijiiiti xmxkxb

xi xj

Ni

)( ii x

∏

yi

Algorithm: BP

)(

1 )()()(iNj

it

ijiiiti xmxxb

xi xj

Ni

)( ii x

∏

yi

Algorithm: BP

)(

1 )()()(iNj

it

ijiiiti xmxxb

xi xj

Ni

∏

)1(1 tiM

Algorithm: BP

xi xj

)1(1 tiM

}1,1{,,

1

)(

)()1(

jx jt

ji

jtj

jijt

ij xm

xbxm

)(

1 )()()(iNj

it

ijiiiti xmxxb

tjim

1

tijm

Algorithm: BP

)(

1 )()()(iNj

it

ijiiiti xmxxb

xi

)( ii x];[)( 2/2/ t

it

i FFii eex

F: a function of the input data

yi

Algorithm: BP

)()1( ti

ti

ti GFb

ueu

1

1)(

xi xj

tiF

withtiG

yi

Algorithm: BP

)()1( ti

ti

ti GFb

)1(log)1(log ti

ti

ti MMG

ueu

1

1)(

xi xj

tiF

withtiG

m

GFxti

tmi

tmimieJ )( ,,,1loglog

yi

Function F

)()()( ,,1

, mit

imit

imit

i yfyFyF

xi

yit

iF

Boosting! f is the weak learner: weighted decision

stumps.

byahyfi )()(

Minimization of loss L

m

GFxti

tmi



m

GFxti

tmi


m

mit

itmi

tmi

f

ti

f

xfYwJt

it

i

2

,,, )(minarglogminarg


m

GFxti

tmi


m

mit

itmi

tmi

f

ti

f

xfYwJt

it

i

2

,,, )(minarglogminarg

)1()1(, ti

ti

tmi bbw

)(,,

,1ti

timi GFx

mitmi exY where

Local Evidence: algorithm

For t=1..T Iterate Nboost times

find the best basis function h update local evidence with update the beliefs update the weights

Iterate NBP times update messages update the beliefs

xi xj

tiF

tiG

yi

ti

ti fF 1

)1()1(, ti

ti

tmi bbw





xi xj

tiF

tiG

yi

ti

ti fF 1

)1()1(, ti

ti

tmi bbw





xi xj

tiF

tiG

yi

ti

ti fF 1

)( ii xb)1()1(, ti

ti

tmi bbw





xi xj

tiF

tiG

yi

ti

ti fF 1

)1()1(, ti

ti

tmi bbw





xi xj

tiF

tiG

yi

ti

ti fF 1

)1()1(, ti

ti

tmi bbw





xi xj

tiF

tiG

yi

ti

ti fF 1

)( ii xb )( jj xb)1()1(, ti

ti

tmi bbw

Function G

By assuming that the graph is densely connected we can make the approximation:

Now G is a non-linear additive function of the beliefs:

1)1(

)1(1

1

tij

tij

m

m

tm

ti bG

1

Function G

Instead of learning the function

can be learnt with an

additive model:

tm

ti bG

1

ij

t

n

tm

ni

tmi bgG

1

1,

bbwabg tm

tm

ni )(

weighted regression stumps

Function G

The weak learner is chosen by

minimizing the loss:

m

bgbgFxtti

t

ntm

ti

tm

ti

tmimiebJ

1

111

,, )()(1 1log)(log

The Boosted Random Field Algorithm

For t=1..T find the best basis function h for f find the best basis function for compute local evidence compute compatibilities update the beliefs update weights

xi xjt

iF

tiG

yi

1

,

tN

ni mi

bg

The Boosted Random Field Algorithm

For t=1..T find the best basis function h for f find the best basis function for compute local evidence compute compatibilities update the beliefs update weights

xi

b1 1

,

tN

ni mi

bg

b2

bj

…

Final classifier

For t=1..T update local evidences F update compatibilities G compute current beliefs

Output classification: )5.0( ,, tmimi bx

Multiclass Detection

U: Dictionary of ~2000 images patches V: Same number of image masks

Multiclass Detection

U: Dictionary of ~2000 images patches V: Same number of image masks

At each round t, for each class c for each dictionary entry d there is a weak learner:

0)()( dddd VUIIv

Function f

To take into account different sizes, we first downsample the image and then upsample and OR the scales:

which is our function for computing the local evidence.

ddyxs

dcyx ssIvIf ])([)( ,,,

Function g

The compatibily function has a similar form:

dC

c

dcyxcyx

ddcyx Wbbg

1'',','',',',, )(

Function g

The compatibily function has a similar form:

W represent a kernel with all the messages directed to node x,y,c

dC

c

dcyxcyx

ddcyx Wbbg

1'',','',',',, )(

Kernels W

Example of incoming messages:

Function G

The overall incoming messages function is given by:

n

nC

c n

ncyx

ncyx

tcyx WbbG

1'',','',',',, )(

'1'

'',','',','

def

C

ccyxcyx Wb

Learning…

Labeled dataset of office and street scenes, with each ~100 images In the first 5 round updated only the local

evidence After the 5th iteration update also the

compatibility functions At each round update only F and G of

the single object class that reduces the most the multiclass cost.

Learning…

Biggest objects are detected first because they reduce the error of all classes the fastest:

The End

Introduction

Observed: Picture Dictionary: Dog

P(Dog|Pic)

Introduction

P(Head|Pici)

P(Tail|Pici)

P(Front Legs|Pici)P(Back Legs|Pici)

Introduction

Comp(Head, Legs)

Comp(Head, Tail)

Comp(F. Legs, B. Legs)

Comp(Tail, Legs)

Dog!

Introduction

P(Piraña|Pici)

Comp(Piraña, Legs)

Graphical Models

Observation nodes yi

Y

yi can be a pixel or a patch

Graphical Models

Hidden Nodes

Local Evidence: ),( iii yx

XDictionary

)( ii xShord-hand

Graphical Models

Compatibility Function:

X

),( jiij xx

Documents

Contextual models for object detection using boosted random fields