1
Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr http://www.robots.ox.ac.uk/~vgg http://cms.brookes.ac.uk/research/visiongroup Aim: To efficiently learn parts-based models which discriminate between positive and negative poses of the object category Results - Sign Language Efficient Reformulation Results - Buffy Parts-based Model G = (V, E)Restricted to Tree he Learning Problem Q(f) = Q a (f(a)) + Q ab (f(a), f(b)) f : V Pose of V (h values) Q a (f(a)) : Unary potential for f(a) Computed using features Q ab (f(a), f(b)): Pairwise potential for validity of (f(a),f(b)) Restricted to Potts Q a (f(a)) : w a T (f(a)) Q a (f(a),f(b)) : w ab T (f(a),f(b)) Q(f) : w T (f) min ||w|| + C i w T (f + i ) + ≥ 1 - + i w T (f - ij ) + ≤ - 1 + - i Maximize margin, minimize hinge loss High energy for all positive examples Low energy for all negative examples Related Work Local Iterative Support Vector Machine (ISVM-1) • Start with a small subset of negative examples (1 per image) • Solve for w and b • Replace negative examples with current MAP estimates • Converges to local optimum • Start with a small subset of negative examples (1 per image) • Solve for w and b • Add current MAP estimates to set of negative examples • Converges to global optimum Global Iterative Support Vector Machine (ISVM-2) Drawback: Requires obtaining MAP estimate of each image at each iteration (computationally expensive) Our: 86.4% Buehler et al.,2008: 87.7% Our: 39.2% Ferrari et al.,2008: 41.0% 100 training images, 95 test images ISVM-1 ISVM-2 Our ISVM-1 ISVM-2 Our 196 training images, 204 test images ISVMs run for twice as long all j (exponential in |V|) = 1, if (f(a),f(b)) L ab , = 0, otherwise. b a w T (f - ij ) + ≤ -1 + - i , for all j M i ba (k) ≥ w b b (l), for all l M i ba (k) ≥ w b b (l) + w ab , for all (k,l) L ab w a a (k) + b M i ba (k) + -1 + - i Exponential in |V| Linear in |V| Linear in h Linear in |L ab | b a b a max ab T 1 - ab T K ab ab s.t. ab T y = 0, ab ≥ 0 0 ≤ i ab (k) + i ab (k,l) i ab (k) + i ab (k,l) ≤ C Problem (1) k i ab (k) = l i ba (l) Constraint (3) Results in a large minimal problem Dual Decomposition Master Problem(1) Problem (2) minimal problem size = 2 Update Lagrange multiplier of (3) SVM-like problems Modified SVM Light min i g i (x), subject to x P min g i (x i ), s.t. x i P, x i = x max min g i (x i ) + i (x i - x), s.t. x i P KKT Condition: i = 0 Solve min g i (x i ) + i x i i = i +x i * Project Problem (1) learns the unary weight vector w a and pairwise weight w ab Problem (2) learns the unary weight vector w b and pairwise weight w ab M i ba (k) analogous to messages in Belief Propagation (BP) Efficient BP using distance transform: Felzenszwalb and Huttenlocher, 2004 Solving the Dual Implementation Details Features Shape: HOG Appearance: (x,x 2 ), x = fraction of skin pixels Data Positive examples: Provided by user Negative examples: All other poses Occlusion Each putative pose can be occluded (twice the number of labels) (f(a),f(b)) max ba T 1 - ba T K ba ba s.t. ba T y = 0, ba ≥ 0 0 ≤ i ba (k) + i ba (k,l) i ba (k) + i ba (k,l) ≤ C Problem (2)

Efficient Discriminative Learning of Parts-based Models

Embed Size (px)

DESCRIPTION

Efficient Discriminative Learning of Parts-based Models. Aim: To efficiently learn parts-based models which discriminate between positive and negative poses of the object category. ISVMs run for twice as long. Efficient Reformulation. Results - Sign Language. Exponential in |V|. - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Discriminative Learning of Parts-based Models

Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr

http://www.robots.ox.ac.uk/~vgg http://cms.brookes.ac.uk/research/visiongroup

Aim: To efficiently learn parts-based models which discriminate between positive and negative poses of

the object category

Results - Sign LanguageEfficient Reformulation

Results - Buffy

Parts-based Model G = (V, E) Restricted to Tree

The Learning Problem

Q(f) = ∑ Qa(f(a)) + ∑ Qab(f(a), f(b))

f : V Pose of V (h values)

Qa(f(a)) : Unary potential for f(a) Computed using featuresQab(f(a), f(b)): Pairwise potential

for validity of (f(a),f(b)) Restricted to Potts

Qa(f(a)) : waT(f(a)) Qa(f(a),f(b)) : wab

T(f(a),f(b)) Q(f) : wT(f)

min ||w|| + C∑ i

wT(f+i) + ≥ 1 - +

i

wT(f-ij) + ≤ -1 + -

i

Maximize margin, minimize hinge lossHigh energy for all positive examplesLow energy for all negative examples

Related WorkLocal Iterative Support Vector Machine (ISVM-1)• Start with a small subset of negative examples (1 per image)• Solve for w and b• Replace negative examples with current MAP estimates• Converges to local optimum

• Start with a small subset of negative examples (1 per image)• Solve for w and b• Add current MAP estimates to set of negative examples• Converges to global optimum

Global Iterative Support Vector Machine (ISVM-2)

Drawback: Requires obtaining MAP estimate of each image at each iteration (computationally expensive)

Our: 86.4% Buehler et al.,2008: 87.7%

Our: 39.2% Ferrari et al.,2008: 41.0%

100 training images, 95 test images

ISVM-1

ISVM-2

Our

ISVM-1

ISVM-2

Our

196 training images, 204 test images

ISVMs run for twice as long

For all j (exponential in |V|)

= 1, if (f(a),f(b)) Lab, = 0, otherwise.

b

a

wT(f-ij) + ≤ -1 + -

i, for all j

Miba(k) ≥ wbb(l), for all l

Miba(k) ≥ wbb(l) + wab,

for all (k,l) Lab

waa(k) + ∑b Miba(k) + ≤ -1 + -

i

Exponential in |V|

Linear in |V|

Linear in h

Linear in |Lab|

b

a

b

a

max abT1 - ab

TKabab

s.t. abTy = 0, ab ≥ 0

0 ≤ ∑ iab(k) + ∑ i

ab(k,l)

∑ iab(k) + ∑ i

ab(k,l) ≤ C

Problem (1)

∑k iab(k) = ∑ li

ba(l) Constraint (3) Results in a large minimal problem

Dual Decomposition

Master

Problem(1) Problem (2)

minimal problem size = 2

Update Lagrange multiplier of (3)

SVM-like problems

Modified SVMLight

min ∑ i gi(x), subject to x P

min ∑ gi(xi), s.t. xi P, xi = x max min ∑ gi(xi) + i(xi - x), s.t. xi P

KKT Condition: ∑ i = 0 Solve min ∑ gi(xi) + ixi i = i +xi* Project

Problem (1) learns the unary weight vector wa and pairwise weight wab

Problem (2) learns the unary weight vector wb and pairwise weight wab

Miba(k) analogous to messages in Belief Propagation (BP)

Efficient BP using distance transform: Felzenszwalb and Huttenlocher, 2004

Solving the Dual

Implementation DetailsFeatures Shape: HOG Appearance: (x,x2), x = fraction of skin pixelsData Positive examples: Provided by user Negative examples: All other posesOcclusion Each putative pose can be occluded (twice the number of labels)

(f(a),f(b))

max baT1 - ba

TKbaba

s.t. baTy = 0, ba ≥ 0

0 ≤ ∑ iba(k) + ∑ i

ba(k,l)

∑ iba(k) + ∑ i

ba(k,l) ≤ C

Problem (2)