Query-Specific Learning and Inference for Probabilistic Graphical Models

Carnegie Mellon

Query-Specific Learning and Inferencefor Probabilistic Graphical Models

Thesis committee: Carlos Guestrin Eric Xing J. Andrew Bagnell Pedro Domingos (University of Washington)

14 June 2011

Anton Chechetka

Motivation

Fundamental problem: to reason accurately about

noisyhigh-dimensional data with

local interactions

Sensor networks

• noisy: sensors fail noise in readings• high-dimensional: many sensors, (temperature, humidity, …) per sensor• local interactions: nearby locations have high correlations

Hypertext classification

• noisy: automated text understanding is far from perfect• high-dimensional: a variable for every webpage• local interactions: directly linked pages have correlated topics

Image segmentation

• noisy: local information is not enough camera sensor noise compression artifacts• high-dimensional: a variable for every patch• local interactions: cows are next to grass, airplanes next to sky

Probabilistic graphical models

high-dimensional data with

local interactions

Probabilistic inference

a graph to encodeonly direct interactions

over many variables

),()|(

EQPEQP

query evidence

Graphical models semantics

Factorized distributions

Graph structure

543 ,, XXXX

X are small subsets of X compact representation

separator

Graphical models workflow

Learn/constructstructure

Learn/defineparameters Inference P(Q|E=E)

Factorized distributions Graph structure

Graph. models fundamental problems

Learn/defineparameters

Inference

P(Q|E=E)

#P-complete (exact)NP-complete (approx)

NP-complete

exp(|X|)

Compoundingerrors

Domain knowledge structures don’t help

(webpages)

Domain knowledge-based structuresdo not support tractable inference

This thesis: general directions

Emphasizing the computational aspects of the graphLearn accurate and tractable models

Compensate for reduced expressive power withexact inference and optimal parametersGain significant speedups

Inference speedups via better prioritization of computationEstimate the long-term effects of propagating information through the graphUse long-term estimates to prioritize updates

New algorithms for learning and inference in graphical models

to make answering the queries better

Thesis contributionsLearn accurate and tractable models

In the generative setting P(Q,E) [NIPS 2007]

In the discriminative setting P(Q|E) [NIPS 2010]

Speed up belief propagation for cases with many nuisance variables [AISTATS 2010]

Generative learning

),()|(

EQPEQP

query goallearning goal

Useful when E is not known in advance

Sensors fail unpredictably

Measurements are expensive (e.g. user time), want adaptive evidence selection

Tractable vs intractable models workflow

learn simple tractablestructure from

domain knowledge + data

approx. P(Q|E=E)

optimal parameters,exact inference

construct intractablestructure from

domain knowledge

approx. P(Q|E=E)

approximate algs:no quality

guarantees

learn intractablestructure from

Tractable models Intractable models

Tractability via low treewidth

Exact inference exponential in treewidth (sum-product)Treewidth NP-complete to compute in generalLow-treewidth graphs are easy to constructConvenient representation: junction treeOther tractable model classes exist too

Treewidth:size of largest clique in a

triangulated graph

Junction treesCliques connected by edges with separatorsRunning intersection propertyMost likely junction tree of given treewidth >1 is NP-completeWe will look for good approximations

X1,X2,X7

X1,X2,X5

X1,X4,X5 X4,X5,X6

X1,X3,X5

X4,X5,X6

X1,X3,X5X1,X2,X5

X1,X4,X5 X4,X5,X6

X1,X3,X5X1,X2,X5

X1,X4,X5

Independencies in low-treewidth distributions

CEC SP

),( )(conditional mutual information

works in the other way too!

SSSXXI | , XPPKL EC ),(||

5 X1,X2,X7X1,X2,X5X1,X4,X5

X4,X5,X6 X1,X3,X5

0 | , SXXI

conditional independencies hold

X = X2 X3 X7X = X4 X6

P(X) factorizes according to a JT

Constraint-based structure learning SSSXXI | , XPPKL EC ),(||

Look for JTs where this holds(constraint-based structure learning)

S1: X1X2

S2: X1X3

S3: X1X4

Sm: Xn-1Xn

all candidateseparators

partition remainingvariables into weakly

dependent subsets

all variables X

find consistentjunction tree

C5X1 X4

I(X , X X | S3) <

Mutual information complexity

I(X , X- | S) = H(X | S) - H(X | X- S3)

everything except for X conditional entropy

I(X , X- | S) depends on all assignments to X:exp(|X|) complexity in general

Our contribution: polynomial-time upper bound

Mutual info upper bound: intuition

I(A,B | C)=??

A BI(D,F|C)

|DF| k

Only look at small subsets D, F

Poly number of small subsetsPoly complexity for every pair

Any conclusions about I(A,B|C)?

In general, no If a good junction tree exists, yes

Contribution: mutual info upper bound

Suppose an -JT of treewidth k for P(ABC) exists:

Let for |DF| k+1

Then I(A, B | C) |ABC| ( + )

= max I(D, F | C)

A B|DF| treewidth+1I(D,F|C)

SSSXXI | ,

Theorem:

Mutual info upper bound: complexityDirect computation: complexity exp(|ABC|)Our upper bound:

O(|AB|treewidth + 1) small subsets

exp(|C|+ treewidth) time each

|C| = treewidthfor structure learning

I(D,F|C)D

|DF| treewidth+1

polynomial(|ABC|) complexity

Guarantees on learned model quality

Theorem:Suppose a strongly connected -JT of treewidth k for P(X) exists.

Then our alg. will with probability at least (1-) find a JT s.t.

)2()1(|| XkPPKL JT

)/log( XO

32)/1log( k

Corollary: strongly connected junction trees are PAC-learnable

quality guarantee

poly samples poly time

using samples and time.

Related workReference Model Guarantees Time[Bach+Jordan:2002] tractable local poly(n)[Chow+Liu:1968] tree global O(n2 log n)[Meila+Jordan:2001] tree mix local O(n2 log n)[Teyssier+Koller:2005] compact local poly(n)[Singh+Moore:2005] all global exp(n)[Karger+Srebro:2001] tractable const-factor poly(n)[Abbeel+al:2006] compact PAC poly(n)[Narasimhan+Bilmes:2004] tractable PAC exp(n)our work tractable PAC poly(n)[Gogate+al:2010] tractable with

high treewidthPAC poly(n)

Results – typical convergence time

good results early on in practice

Results – log-likelihoodbe

our method

OBS local search in limited in-degree Bayes netsChow-Liu most likely JTs of treewidth 1Karger-Srebro constant-factor approximation JTs

ConclusionsA tractable upper bound on conditional mutual infoGraceful quality degradation and PAC learnability guaranteesAnalysis on when dynamic programming works[in the thesis]

Dealing with unknown mutual information threshold[in the thesis]

Speedups preserving the guaranteesFurther speedups without guarantees

Discriminative learning

),()|(

EQPEQP

query goal learning goal

Useful when variables E are always the sameNon-adaptive, one-shot observation

Image pixels scene descriptionDocument text topic, named entities

Better accuracy than generative models

Discriminative log-linear models

EQfwwEZ

wEQP ,exp),(

feature(domain knowledge)

weight(learn from data)

evidence-dependentnormalization

Don’t sum over all values of EDon’t model P(E)

No need for structure over E

Evidence

Model tractability still important

Observation #1: tractable models are necessary for exact inference and parameter learning in the discriminative setting

Tractability is determined by the structure over query

Simple local models: motivation

evidence

Q=f(E)

Locally almost linear

Exploiting evidence values overcomes the expressive power deficit of simple models

We will learn local tractable models

Context-specific independence

Observation #2: use evidence values at test time to tune the structure of the models, do not commit to a single tractable model

noedge

Low-dimensional dependencies in generative structure learning

H(C)H(S)LLH ),(cliques

Generative structure learning often relies only on low-dimensional marginals

Junction trees:decomposable scores

separators

Low-dimensional independence tests: ??

)|,( SBAI

Small changes to structure quick score recomputation

Discriminative structure learning: need inference in full modelfor every datapoint even for small changes in structure

Leverage generative learning

Observation #3: generative structure learning algorithms have very useful properties, can we leverage them?

Observations so farDiscriminative setting has extra information, including evidence values at test time

Want to use to learn local tractable models

Good structure learning algorithms exist for generative settingthat only require low-dimensional marginals P(Q)

Approach: 1. use local conditionals P(Q | E=E) as “fake marginals” to learn local tractable structures 2. learn exact discriminative feature weights

Evidence-specific CRF overviewApproach: 1. use local conditionals P(Q | E=E) as “fake marginals” to learn local tractable structures 2. learn exact discriminative feature weights

Local conditional density estimators P(Q | E)

Evidencevalue E=E

P(Q | E=E)

Generative structurelearning algorithm

Tractable structurefor E=E

Featureweights w

Tractable evidence-specific CRF

Evidence-specific CRF formalism

),(),(exp),,(

1),,|( uEIEQfw

uwEZuwEQP

Observation: identically zero feature 0 does not affect the model

evidence-specific structure: I(E,u){0, 1}extra “structural” parameters

Fixed dense model

Evidence-specific

tree “mask”

Evidence-specific model× =( () ) ( )

Evidence-specific

feature values( )

E=E1 ×××

Evidence-specific CRF learning

Learning is in the same order as testing

Local conditional density estimators P(Q | E)

Evidencevalue E=E

P(Q | E=E)

Generative structurelearning algorithm

Tractable structurefor E=E

Featureweights w

Tractable evidence-specific CRF

Plug in generative structure learning

),(),(exp),,(

1),,|( uEIEQw

uwEZuwEQP

encodes the output of the chosen structure learning algorithm

Generative Discriminative

P(Qi,Qj)

(pairwise marginals)+

Chow-Liu algorithm=

optimal tree

P(Qi,Qj|E=E)

(pairwise conditionals)+

Chow-Liu algorithm=

good tree for E=E

Directly generalize generative algorithms :

Evidence-specific CRF learning: structure

Choose generative structure learning algorithm A

Identify low-dimensional subsets Qβ that A may need

Chow-Liu

All pairs (Qi, Qj)

E Q E Q1,Q2 E EQ1,Q3 Q3,Q4

,original problem low-dimensional pairwise problems

),|,(ˆ1331 uEQQP ),|,(ˆ

3443 uEQQP),|,(ˆ1221 uEQQP

Estimating low-dimensional conditionals

Use the same features as the baseline high-treewidth model

QQEQuuEZ

s.t. ,exp),(

EQwwEZ

wEQP ,exp),(

1)|,(Baseline CRF

Low-dimensionalmodel

Scope restriction

End result: optimal u

Evidence-specific CRF learning: weights

),(),(exp),,(

1),,|( uEIEQw

uwEZuwEQP

Already chosen the algorithm behind I(E,u)

Already learned parameters u

Only need to learn feature weights w

log P(Q|E,w,u) is concave in w unique global optimum

“effective features”

EEQEEQ

E ,E,,),,|(log

QffuIw

uwPuwQP

Tree-structured distribution

Fixed dense model

Evidence-specific

tree “mask”( () )

Exacttree-structuredgradients wrt w( )

Overall gradient(dense)( )

Results – WebKBText + links webpage topic

Prediction error TimeSVM RMN ESS-CRF M3N

RMN ESS-CRF M3N0

200400600800

10001200

Ignore links Standard dense CRF Our work Max-margin model

Image segmentation - accuractlocal segment features + neighbor segments type of object

Logisti

c regressi

Dense CRF

ESS-CRF

0.6000000000000010.6400000000000010.6800000000000010.7200000000000010.760000000000001

Accuracy

Ignore links Standard dense CRF Our work

Image segmentation - time

Train time (log scale)

Logistic regression

Dense CRF ESS-CRF2

Test time (log scale)

Logistic regression

Dense CRF ESS-CRF0.3

Ignore links Standard dense CRF Our work

ConclusionsUsing evidence values to tune low-treewidth model structure

Compensates for the reduced expressive powerOrder of magnitude speedup at test time (sometimes train time too)

General framework for plugging in existing generative structure learnersStraightforward relational extension [in the thesis]

Why high-treewidth models?A dense model expressing laws of nature

Protein folding

Max-margin parameters don’t work well (yet?) with evidence-specific structures

Query-Specific inference problemevidencequery not interesting

Using information about the queryto speed up convergence of belief propagation

for the query marginals

jiij XXfP )()(X

(loopy) Belief PropagationPassing messages along edges

Variable belief:

Update rule:

Result: all single-variable beliefs

ikEkjj

tij xmxxfxm

)()1( )()()(

ijit xmxP )()(

~ )()(

(loopy) Belief PropagationMessage dependencies are local:

Freedom in scheduling updatesRound–robin schedule

Fix message orderApply updates in that order until convergence

dependence

Dynamic update prioritization

Fixed update sequence is not the best optionDynamic update scheduling can speed up convergence

Tree-Reweighted BP [Wainwright et. al., AISTATS 2003]Residual BP [Elidan et. al. UAI 2006]

Residual BP apply the largest change first

informative update

wasted computation

large change large

change

large change

small change

Residual BP [Elidan et. al., UAI 2006]

Update rule:

Pick edge with largest residual

Update

oldnew

)()(max OLDij

NEWij mm

More effort on the difficult parts of the model

ijiNEW

ij xmxxfxmj ,

)()( )()()(

)(OLDijm

)( NEWijm

But no query

• Residual BP updates• no influence on the query• wasted computation

Why edge importance weights?query

residual < residualwhich to update??

• want to update • influence on the query in the future

Residual BP max immediate residual reduction

Our work max approx. eventual effect on P(query)

Query-Specific BP

Update rule:

Pick edge with

Update

oldnew

ij Amm max )()(

ijiNEW

ij xmxxfxmj ,

)()( )()()(

)(OLDijm

)( NEWijm

Rest of the talk: defining and computing edge importance

edgeimportance

the only change!

Edge importance base case

approximate eventual update effect on P(Q)

NEWij Amm )()(

||P(NEW)(Q) P(OLD)(Q)|| ||m(NEW) m(OLD)||change in query belief change in message

tight bound

Base case: edge directly connected to the query Aji=??

1ji ji

||P(Q)|| ||m ||ji

||m ||over values of

all other messages

mjisup mrj

Edge one step away from the query:

Arj=??

mjisup mrj

Edge importance one step awayquery

||P(Q)||

change in query belief

change in message

can compute in closed formlooking at only fji [Mooij, Kappen; 2007]

message importance

rj||m ||

One step away:

Edge importance general case

queryr

||P(Q)|| ||msh|| P(Q)msh

P(Q)msh

Base case: Aji=1

mhrsup msh

mrjsup mhr

mjisup mrj

sensitivity(): max impact along the

Generalization? expensive to compute bound may be infinite

Edge importance general case

queryr

P(Q)msh

mhrsup msh

mrjsup mhr

mjisup mrj

sensitivity(): max impact along the

Ash = max all paths from to query sensitivity()h

There are a lot of paths in a graph,trying out every one is intractable

Efficient edge importance computation

A = max all paths from to query sensitivity()

There are a lot of paths in a graph,trying out every one is intractable

always 1

always decreases as the path grows

mhrsup msh

mrjsup mhr

mjisup mrj

sensitivity( hrji ) =

always 1always 1

decomposes into individual edge contributions

Dijkstra’s (shortest paths) alg. will efficiently find max-sensitivity paths

for every edge

Aji = max all paths from i to query sensitivity()

Query-Specific BP

Run Dijkstra’s alg starting at query to get edge weights

Pick edge with largest weighted residual

Update

ij Amm max )()(

)(OLDijm

)( NEWijm )( NEWijm

More effort on the difficult parts of the model

Takes into account not only graphical structure, but also strength of dependencies

and relevant

Experiments – single query

Easy model(sparse connectivity,weak interactions)

Hard model(dense connectivity,strong interactions)

Standard residual BP Our work

Faster convergence, but long initialization still a problem

Anytime query-specific BPquery

Dijkstra’s alg. BP updates

Query-specific BP:

Anytime QSBP:

same BP update sequence!

Experiments – anytime QSBP

Standard residual BP Our work

Much shorter initialization

Our work + anytime

Experiments – multiquery

Standard residual BP Our work Our work + anytime

ConclusionsWeighting edges is a simple and effective way to improve prioritizationWe introduce a principled notion of edge importance based on both structure and parameters of the modelRobust speedups in the query-specific setting

Don’t spend computation on nuisance variables unless needed for the query marginal

Deferring BP initialization has a large impact

Future workMore practical JT learning

SAT solvers to construct structure, pruning heuristics, …

Evidence-specific learningTrade efficiency for accuracyMax-margin evidence-specific models

Theory on ES structures too

Inference:Beyond query-specific: better prioritization in generalBeyond BP: query-specific Gibbs sampling?

Thesis conclusionsGraphical models are a regularization technique for high-dimensional distributionsRepresentation-based structure is well-understood

Conditional independencies

Right now, structured computation is a “consequence” of representation

Major issues with tractability, approximation quality

Logical next step structured computation as a primary basis of regularizationThis thesis: computation-centric approaches have better efficiency and do not sacrifice accuracy

Thank you!

Collaborators: Carlos Guestrin, Joseph Bradley, Dafna Shahaf

Mutual info upper bound: qualityUpper bound:

Suppose an -JT exists is the largest mutual information over small subsetsThen I(A, B | C) |ABC| ( + )

No need to know the -JT, only that it exists

No connection between C and the JT separators

C can be of any size, no connection to JT treewidthThe bound is loose only when there is no hope to learn a good JT

Typical graphical models workflow

Learn/defineparameters

Inference

P(Q|E=e)

reasonable intractablestructure from

domain knowledge

approx. P(Q|E=e)

The graph isprimarily a

representationtool

approximate algs:no quality

guarantees

Contributions – tractable modelsLearn accurate and tractable models

In the generative setting [NIPS 2007]Polynomial-time conditional mutual information upper boundFirst PAC-learning result for strongly connected junction treesGraceful degradation guaranteesSpeedup heuristrics

In the discriminative setting [NIPS 2010]General framework for learning CRF structure that depends on evidence values at test timeExtensions to the relational settingEmpirical: order of magnitude speedups with the same accuracy as high-treewidth models

Contributions – faster inferenceSpeed up belief propagation for cases with many nuisance variables [AISTATS 2010]

A framework of importance-weighted residual belief propagationA principled measure of eventual impact of an edge update on the query belief

Prioritize updates by importance for the query instead of absolute magnitude

An anytime modification to defer much of initializationInitial inference results available much soonerOften much faster eventual convergenceThe same fixed points as the full model

Future workTwo main bottlenecks:

Constructing JTs given mutual information values.Esp. with non-uniform treewidth, dependence strength

Large sample: learnability guarantees for non-uniform treewidth Small sample: non-uniform treewidth for regularizationConstraint satisfaction, SAT solvers, etc?Relax strong connectivity requirement?

Evaluating mutual information:need to look at 2k+1 variables instead of k+1, large penalty

Branch on features instead of sets of variables? [Gogate+al:2010]

Speedups without guaranteesLocal search, greedy separator construction, …

Log-linear parameter learningconditional log-likelihood

EQD),(

)|(log)( ,wP|wLLH

Convex optimization: unique global maximum

Gradient: features – [expected features]

E ,E,),|(log

need inference inference for every E given w

Log-linear parameter learning

Generative (E=) Discriminative

Tractable Closed-form Exact gradient-based

IntractableApproximate

gradient-based(no guarantees)

Approximategradient-based(no guarantees)

Inference once per weights update

Inference for every datapoint (Q,E)once per weights update

“manageable” slowdownby the number of datapoints

Complexity“phase

transition”

Plug in generative structure learning

),(),(exp),,(

1),,|( uEIEQw

uwEZuwEQP

encodes the output of the chosen structure learning algorithm

Chow-Liu for optimal treesOur thin junction tree learning from part 1Karger-Srebro for high-quality low-diameter junction treesLocal search, etc …

Fix algorithm always get structures with desired properties (e.g. treewidth):

replace P(Q) with approximate conditionals P(Q | E=E, u) everywhere

),(),(exp),,(

1),,|( uEIEQw

uwEZuwEQP

Already knowalgorithm behind I(E,u)Already learned u

Only need to learn w

Structure induced by I(E,u)is always tractable

Can find evidence-specific structure I(E=E,u)

for every training datapoint (Q,E)

Learn optimal w exactly

EEQEEQ

E ,E,,),,|(log

QffuIw

uwPuwQP

Tree-structured distribution

Relational evidence-specific CRFRelational models: templated features + shared weights

webpage webpageLinksTo

LinksTo

Relation:

Groundings:

Learn a singleweight wLINK

Copy weight for every grounding

Relational evidence-specific CRFRelational models: templated features + shared weights

Every grounding is a separate datapoint for structure training

use propositional approach + shared weights

Grounded model Training datasets for “structural” parameters u

x1 x2x1 x3

Future workFaster learning: pseudolikelihood is really fast, need to competeLarger treewidth: trade time for accuracyTheory on learning “structural parameters” uMax-margin learning

Inference is basic step in max-margin learning too tractable models are useful beyond log-likelihoodOptimizing feature weights w given local trees is straightforwardOptimizing “structural parameters” u for max-margin is hard

What is the right objective?

Almost tractable structures, other tractable modelsMake sure loops don’t hurt too much

Query versus nuisance variablesWe may actually care about only few variables

What are the topics of the webpages on the first page of Google search results for my query?Smart heating control: is anybody going to be at home for the next hour?Does the patient need immediate doctor attention?

But the model may have a lot of other variables to be accurate enough

Don’t care about them per se, but necessary to look at to get the query right

Both query and nuisance variables are unknown, inference algorithms don’t see a differenceSpeed up inference by focusing on the query

Only look at nuisance variable to the extent needed to answer the query

Our contributions

Using weighted residuals to prioritize updates

Define message weights reflecting the importance of the message to the query

Computing importance weights efficiently

Experiments: faster convergence on large relational models

Interleaving

Dijkstra’s expands the highest weight edges firstqueryexpanded on

previous iteration just expanded

not yet expanded

min expanded edges A A

suppose

M min expanded A

no need to expand further at this point

upper bound on priorityactual priority of

)()( max OLD

ijEDGESALLij mmM

ijEXPANDEDij Amm )()(max

Deferring BP initialization

Observation: Dijkstra’s alg. expands the most important edges first

Do we really need to look at every low importance edgebefore applying BP updates?

No! Can use upper bounds on priority instead.

Upper bounds in priority queue

Observation: for edges low in the priority queue, an upper bound on the priority is enough

Updates priority queue

Exact priority needed fortop element

Priority upper boundis enough here

|| factor( ) ||

Priority upper bound for not yet seen edges

priority( ) = residual( ) importance weight( )

importance weight( )s.t. is already expanded

priority( )

Component-wise upper bound without looking at the edge!

Expand several edges with Dijkstra’s : For :(residual) (weight) = (exact priority)

For all the other edges…

Interleaving BP and Dijkstra’s

Dijkstra

exact priority upper bound BP update

exact priority upper bound Dijkstra expand an edge

queryfull model

Query-Specific Learning and Inference for Probabilistic Graphical Models

Documents

Probabilistic Graphical Models

Probabilistic Graphical Modelsepxing/Class/10708-15/slides/lecture13-VI.pdf · School of Computer Science Probabilistic Graphical Models Variational (Bayesian) Inference and Mean

Probabilistic Graphical Models - University of Arizonajunmingy/teaching/lecture16-Principle.pdfProbabilistic Graphical Models Variational Inference III: Variational Principle I Junming

Probabilistic Graphical Models...Graphical Models, Inference, Learning Graphical Model: A factorized probability representation • Directed: Sequential, causal structure for generative

Reasoning with Deterministic and Probabilistic graphical models …frossi/rina2.pdf · 2012. 5. 22. · Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference

CSC535: Probabilistic Graphical Models

Learning and Inference in Probabilistic Graphical Models · some existing learning or inference algorithm • Experimentally compare different models or algorithms on an interesting,

Probabilistic Graphical Models - Radboud Universiteit · Probabilistic graphical models (PGMs) ... – AssignmentI ImplementaBayesiannetworkforareal-worlddomain. ... :437–48,2014

Approximate inference in probabilistic relational modelsfkaeli/thesis.pdf · Probabilistic Relational Models (PRMs) are a type of directed graphical model used in the setting of statistical

Introduction to Probabilistic Graphical Models · This tutorial provides an introduction to probabilistic graphical models. We review three rep-resentations of probabilistic graphical

Lifted Probabilistic Inference in Relational Modelssuciu/tutorial-uai2014.pdf · Why Statistical Relational Models? Probabilistic graphical models Quantify uncertainty and noise Not

Probabilistic Inference in General Graphical Models …apophenia.wdfiles.com/local--files/start/pecevski_2011...probabilistic inference in fairly large graphical models, yielding some

Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference

Probabilistic Graphical Models: Distributed Inference and ...ssg.mit.edu/ssg_theses/ssg_theses_2010_present/LiuY_Phd_6_14.pdf · Probabilistic Graphical Models: Distributed Inference

Inference in Probabilistic Graphical Models by …Inference in Probabilistic Graphical Models by Graph Neural Networks 2. Related Work Several researchers have used neural networks

Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Markov Chain Monte Carlo Inference Probabilistic Graphical

1 Probabilistic Inference and Learningepxing/Class/10708-17/... · 1 Probabilistic Inference and Learning In practice, exact inference is not used widely, and most probabilistic inference

Probabilistic inference in graphical modelsmlg.eng.cam.ac.uk/zoubin/course04/hbtnn2e-I.pdf · Probabilistic inference in graphical models ... inference algorithms allow statistical

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference