Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use

CarnegieMellon

Evidence-Specific Structures for Rich Tractable CRFs

Anton Chechetka, Carlos Guestrin

General approach:1.Ground the model / features

2.Use standard ESS-CRFs +parameter sharing

want P(structured Query | Evidence):

P( | , )

P( | )webpage text + links

professorstudentstudentproject

B

BACP( | )

collectionof images +face similaritiesface

labels

Collaborative filtering:

Webpage classification:

Face recognition in image collections:

ji

jiijij EQQfwEZ

EQP,

),,(exp)(

1)|(

featuresweightsnormalization

Dense models

Evidence

Query

f12

f34

Induced structure over Qf12

f34

)],ˆ([),(

)|(log

)|(ˆ eqfEeqf

eqP

ijeQPqij

ij

feature expected feature

need inference in the induced model

Exact inference: #P-complete

Approximate inference:NP-complete

Hopeless in large dense models

Easy for tree-structured models

Features can be arbitrarilycorrelated

Convex objective

Unique global optimum

Intuitive gradient:

Tree models

Capture complex dependencies

Natural extensions to relational settings

Arbitrarily bad inference quality

Arbitrarily bad parameters quality

Simple dependencies only

Relational settings are not tree-structured

Efficient exact inference

Efficient learning of optimal parameters

This work:Keep efficient exact inference + parameters, enable rich dependencies and relational extensions

ji

jiijij EQQfwuETijIEZ

EQP,

),,()),((exp)(

1)|(

evidence-specific structure standard weighted features

structure selection parametersstructure selection algorithm

Batteryis good

Enginestarts

E={gas tank

is empty} Batteryis good

EnginestartsDependence

in generalNo dependence for

this specific evidence

CRF with Evidence-Specific StructureFormalism:

Motivation

Conditional Random Fields

Model Structure Tradeoffs

Intuition: Edge importance depends on evidence

Fixed dense model

Evidence-specific tree “mask”

Evidence-specific model× =( () ) ( )

E=e1

E=e2

E=e3

E=e1

E=e2

E=e3

×

Captureall potentialdependencies

Select the most important treespecific to the evidence value( ) ( )

Select tree structures, based on evidence,to capture the most important dependencies:

T(E,u) encodes the output of a structure selection algorithm

Global perspective on structure selection

Easy to guarantee tree structure(by selecting appropriate alg.)

Looking at one edge at a time is not enough to guarantee tree structure: being a tree is a global property

Objective still convex in w (but not u)

Efficient exact inference

Efficient learning of optimal parameters w

Much richer class of models than fixed trees (potential for capturing complex correlations)

Structure selection decoupled from feature design and weights (can use an arbitrarily dense model as the basis)

1. choose features f

2. choose tree learning algorithm T(E,)3. learn u

4. select evidence-specific trees T(ei,u) for every datapoint (E=ei,Q=qi) [u is fixed at this stage]

5. given u, trees T(ei,u), learn w [L-BFGS, etc.]

Learning a ESS-CRF model:

Algorithm Stage Dense CRFs ESS-CRFs (this work)

Structureselection ApproximateApproximate ApproximateApproximate

Feature weight learning

ApproximateApproximate(no gurarantees)(no gurarantees)

ExactExact

Test time inference

ApproximateApproximate(no gurarantees)(no gurarantees)

ExactExact

Parameter sharing for both w and u:

one weight per relation, not per grounding

P(Q) (no evidence)P(Qi,Qj|E=e)

(pairwise conditionals)+

Chow-Liu algorithm=

good tree for E=e

P(Q|E=e)

Directly generalize existing algorithms for the no-evidence case:

Train stage:

Learning Good Evidence-Specific Trees

E QE Q1,Q2 E EQ1,Q3 Q3,Q4

,original high-dimensional problem

low-dimensional pairwise problems

…

),|,(ˆ),(ˆ , ijjijiue ueEQQPQQP

),|,(ˆ 1331 uEQQP ),|,(ˆ 3443 uEQQP

learn pairwise conditional estimators params u

Test stage (evidence-specific Chow-Liu alg.):

Instantiate evidence in pairwise estimators:

Compute mutual information values

),(ˆ ueI ijedge

weights

Q1

Q3 Q4

Q2

Return maximum spanning tree:Q1

Q3 Q4

Q2

Fewer Sources of Errors

),|,(ˆ 1221 uEQQP

Learning Optimal Feature Weights

Our Approach:Evidence-Specific Structures

)]),ˆ([),(()),((

),|(log

),|(ˆ eqfEeqfueTijI

ueqP

ijueQPqij

ij

sparsity conforms to theevidence-specific structure

structure-related parameters u are fixed from the tree-learning step

efficient exact computationbecause T(e,u) is a tree

+ +

=E=e1

( () ) ( )E=e3E=e2

individual datapoints tree-sparse gradients (with different evidence-dependent

sparsity patterns)

overall dataset:dense gradient,but still tractable

Relational Extensions

ResultsFace recognition [w/ Denver Dash, Matthai Philipose]

• Exploit face similarities to propagate labels in collections of images

• Semi-superwised relational model

• 250…1700 images, 4…24 unique people

• Compare against dense discriminative models

disagree

Equal or better accuracy

100 times faster

WebKB [data + features thanks to Ben Taskar]

• webpages text + links page type (student, project,…)

same accuracy as dense models

~10 times faster

Can exactly compute the convex objective and its gradient

Use L-BFGS or conjugate gradient to find the

unique global optimum w.r.t. w exactly

Gradient similar to standard CRFs:

Qi Qj

1

1

0

0

1…0

Parameters dimensionality independent of model size

Reduces overfitting

Structure selection only after grounding

No worries about structure being a tree on the relational level

P(Qi,Qj) (pairwise marginals)

+Chow-Liu algorithm

=optimal tree

( )

Acknowledgements: this work has been supported by NSF Career award IIS-0644225 and by ARO MURI W911NF0710287 and W911NF0810242

agree

Documents

Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use