INTRODUCTION TO GRAPHICAL MODELS SLIDE CREDITS: KEVIN MURPHY, MARK PASHKIN, ZOUBIN GHAHRAMANI AND...

INTRODUCTION TO GRAPHICAL MODELSSLIDE CREDITS: KEVIN MURPHY, MARK PASHKIN, ZOUBIN GHAHRAMANI AND JEFF BILMES

CS188: Computational Models of Human Behavior

Reasoning under uncertainty

• In many settings, we need to understand what is going on in a system when we have imperfect or incomplete information

• For example, we might deploy a burglar alarm to detect intruders– But the sensor could be triggered by other events, e.g.,

earth-quake

• Probabilities quantify the uncertainties regarding the occurrence of events

Probability spaces

• A probability space represents our uncertainty regarding an experiment

• It has two parts:– A sample space , which is the set of outcomes– the probability measure P, which is a real function of the

subsets of • A set of outcomes A is called an event. P(A)

represents how likely it is that the experiment’s actual outcome be a member of A

An example

• If our experiment is to deploy a burglar alarm and see if it works, then there could be four outcomes:

= {(alarm, intruder), (no alarm, intruder), (alarm, no intruder), (no alarm, no intruder)}

• Our choice of P has to obey these simple rules …

The three axioms of probability theory

• P(A)≥0 for all events A• P()=1• P(A U B) = P(A) + P(B) for disjoint events A and B

Some consequences of the axioms

Example

• Let’s assign a probability to each outcome ω

• These probabilities must be non-negative and sum to one

intruder no intruder

alarm 0.002 0.003

no alarm 0.001 0.994

Conditional Probability

Marginal probability

• Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur.

• For example, if there are two possible outcomes corresponding to events B and B', this means that – P(A) = P(AB) + P(AB’)

• This is called marginalization

Example• If P is defined by

then P({(intruder, alarm)|(intruder, alarm),(no intruder, alarm)})

intruder no intruder

alarm 0.002 0.003

no alarm 0.001 0.994

P({(intruder,alarm)} {(intruder,alarm),(no intruder,alarm)})({(intruder,alarm),(no intruder,alarm)})

P({(intruder,alarm)})({(intruder,alarm),(no intruder,alarm)})

0.0020.4

(0.002 0.003)

The product rule

• The probability that A and B both happen is the probability that A happens and B happens, given A has occurred

The chain rule

• Applying the product rule repeatedly:

P(A1,A2,…,Ak) = P(A1) P(A2|A1)P(A3|A2,A1)…P(Ak|Ak-1,…,A1)

• Where P(A3|A2,A1) = P(A3|A2A1)

Bayes’ rule

• Use the product rule both ways with P(AB)– P(A B) = P(A)P(B|A)– P(A B) = P(B)P(A|B)

Random variables and densities

Inference

• One of the central problems of computational probability theory

• Many problems can be formulated in these terms. Examples:– The probability that there is an intruder given the alarm

went off is pI|A(true, true)

• Inference requires manipulating densities

Probabilistic graphical models

• Combination of graph theory and probability theory– Graph structure specifies which parts of the system are

directly dependent– Local functions at each node specify how different parts

interaction

• Bayesian Networks = Probabilistic Graphical Models based on directed acyclic graph

• Markov Networks = Probabilistic Graphical Models based on undirected graph

Some broad questions

Bayesian Networks

• Nodes are random variables• Edges represent dependence – no directed cycles

allowed)

• P(X1:N) = P(X1)P(X2|X1)P(X3|X1,X2) = P(Xi|X1:i-1) = P(Xi|Xi)

Example

• Water sprinkler Bayes net

P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R) chain rule

=P(C)P(S|C)P(R|C)P(W|C,S,R) since R S|C

=P(C)P(S|C)P(R|C)P(W|S,R) since W C|R,S

Inference

Naïve inference

Problem with naïve representation of the joint probability

• Problems with the working with the joint probability– Representation: big table of numbers is hard to understand

– Inference: computing a marginal P(Xi) takes O(2N) time

– Learning: there are O(2N) parameters to estimate

• Graphical models solve the above problems by providing a structured representation for the joint

• Graphs encode conditional independence properties and represent families of probability distribution that satisfy these properties

Bayesian networks provide a compact representation of the joint probability

Conditional probabilities

Another example: medical diagnosis (classification)

Approach: build a Bayes’ net and use Bayes’s rule to get class probability

A very simple Bayes’ net: Naïve Bayes

Naïve Bayes classifier for medical diagnosis

Another commonly used Bayes’ net: Hidden Markov Model (HMM)

Conditional independence properties of Bayesian networks: chains

Conditional independence properties of Bayesian networks: common cause

Conditional independence properties of Bayesian networks: explaining away

Global Markov properties of DAGs

Bayes ball algorithm

Example

Undirected graphical models

Parameterization

Clique potentials

Interpretation of clique potentials

Examples

Joint distribution of an undirected graphical model

Complexity scales exponentially as 2n for binary random variable if we use a naïve approach to computing the partition function

Max clique vs. sub-clique

Log-linear models

Summary

From directed to undirected graphs

Example of moralization

Comparing directed and undirected models

Expressive power

Coming back to inference

Belief propagation in trees

Learning

Parameter Estimation

Maximum-likelihood Estimation (MLE)

Example: 1-D Gaussian

MLE for Bayes’ Net

MLE for Bayes’ Net with Discrete Nodes

Parameter Estimation with Hidden Nodes

Z1 Z2 Z3 Z4 Z5 Z6

Why is learning harder?

Where do hidden variables come from?

Parameter Estimation with Hidden Nodes

Different Learning Conditions

Structure Observability

Full Partial

Known Closed form search EM

Unknown Local search Structural EM

INTRODUCTION TO GRAPHICAL MODELS SLIDE CREDITS: KEVIN MURPHY, MARK PASHKIN, ZOUBIN GHAHRAMANI AND...

Documents

Learning dynamic Bayesian networks - Duke University...University of Toronto Toronto, ON M5S 3H5, Canada http : [/www. cs. utoront o. ca/~zoubin/ zoubin@cs .toronto. edu 1 Introduction

PRINCIPAL APPOINTMENTS - Cambridge Machine Learning …mlg.eng.cam.ac.uk/zoubin/fullcv.pdf · PRINCIPAL APPOINTMENTS Chief Scientist ... Arti cial Intelligence and Information Science

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani

Learning the structure of Deep sparse Graphical Model Ryan Prescott Adams Hanna M Wallach Zoubin Ghahramani Presented by Zhengming Xing Some pictures are

Jan David Brehm, Alexander Bilmes, Georg Weiss, Alexey V ... · Transmission-line resonators for the study of individual two-level tunneling systems Jan David Brehm,1 Alexander Bilmes,1

Bayesian Modelling - Cambridge Machine Learning Groupmlg.eng.cam.ac.uk/zoubin/talks/lect1bayes.pdf · Bayesian Modelling Zoubin Ghahramani Department of Engineering ... Then, unless

Dropout as a Bayesian Approximation Presented by Qing Sun Yarin Gal and Zoubin Ghahramani

Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University

Machine Learning and Information Retrieval Zoubin Ghahramani Department of Engineering University of Cambridge

Yarin Gal, Zoubin Ghahramani...Arthur U Asuncion, Padhraic Smyth, and Max Welling. Asynchronous distributed learning of topic models. In Advances in Neural Information Processing Systems,

[Stiglitz Joseph, Bilmes Linda] the Three Trillion(Bookos.org)

Roger B. Grosse Zoubin Ghahramani arXiv:1511.02543v1 [stat

Bayesian Methods for Machine Learning€¦ · Bayesian Methods for Machine Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK Center for

Dropout as a Bayesian Approximation: AppendixDropout as a Bayesian Approximation: Appendix Yarin Gal University of Cambridge fyg279,zg201g@cam.ac.uk Zoubin Ghahramani Abstract We show

The Automatic · The Automatic Data Scientist Kristian Kersting Alejandro Molina TU Darmstadt Antonio Vergari MPI-IS Robert Peharz U. Cambridge Zoubin Ghahramani UBER AI Lab, U. Cambridge

Jeff A. Bilmes - UAIauai.org/uai2003/bilmes_tutorial_color.pdf · 2010. 10. 25. · 1 Graphical Model Research in Audio, Speech, and Language Processing Jeff A. Bilmes University

Probabilistic Models for Unsupervised Learningmlg.eng.cam.ac.uk/zoubin/nipstut.pdf · Probabilistic Models for Unsupervised Learning Zoubin Ghahramani Sam Roweis Gatsby Computational

Graphical Models in Speech and Language Researchmelodi.ee.washington.edu/~bilmes/bilmes_hlt04_tutorial/bilmes_tutorial.pdf · Graphical Models in Speech and Language Research Jeff

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006

Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts