94
Graphical Models Tamara L Berg CSE 595 Words & Pictures

Graphical Models

  • Upload
    denton

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Graphical Models. Tamara L Berg CSE 595 Words & Pictures. Announcements. HW3 online tonight Start thinking about project ideas Project proposals in class Oct 30 Come to office hours Oct 23-25 to discuss ideas Projects should be completed in groups of 3 - PowerPoint PPT Presentation

Citation preview

Page 1: Graphical Models

Graphical Models

Tamara L BergCSE 595 Words & Pictures

Page 2: Graphical Models

Announcements

HW3 online tonight

Start thinking about project ideas Project proposals in class Oct 30 Come to office hours Oct 23-25 to discuss

ideasProjects should be completed in groups of 3Topics are open to your choice, but need to

involve some text and some visual analysis.

Page 3: Graphical Models

Project Types

• Implement an existing paper• Do something new• Do something related to your research

Page 4: Graphical Models

Data Sources• Flickr

– API provided for search. Example code to retrieve images and associated textual meta-data available here: http://graphics.cs.cmu.edu/projects/im2gps/

• SBU Captioned Photo Dataset– 1 million user captioned photographs. Data links available here:

http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/

• ImageClef– 20,000 images with associated segmentations, tags and descriptions available here:

http://www.imageclef.org/

• Scrape data from your favorite web source (e.g. Wikipedia, pinterest, facebook)

• Rip data from a movie/tv show and find associated text scripts

• Sign language– 6000 frames continuous signing sequence, for 296 frames, the position of the left and right, upper

arm, lower arm and hand were manually segmented http://www.robots.ox.ac.uk/~vgg/data/sign_language/index.html

Page 5: Graphical Models

Generative vs DiscriminativeDiscriminative version – build a classifier to discriminate

between monkeys and non-monkeys.P(monkey|image)

Page 6: Graphical Models

Generative version - build a model of the joint distribution.

P(image,monkey)

Generative vs Discriminative

Page 7: Graphical Models

Generative vs Discriminative

Can use Bayes rule to compute p(monkey|image) if we know p(image,monkey)

Page 8: Graphical Models

Generative vs Discriminative

Can use Bayes rule to compute p(monkey|image) if we know p(image,monkey)

DiscriminativeGenerative

Page 9: Graphical Models

Talk Outline

1. Quick introduction to graphical models

2. Examples: - Naïve Bayes, pLSA, LDA

Page 10: Graphical Models

Slide from Dan Klein

Page 11: Graphical Models

Random Variables

Random variables

be a realization of Let

Page 12: Graphical Models

Random Variables

Random variables

be a realization of Let

A random variable is some aspect of the world about which we (may) have uncertainty.

Random variables can be:Binary (e.g. {true,false}, {spam/ham}), Take on a discrete set of values

(e.g. {Spring, Summer, Fall, Winter}), Or be continuous (e.g. [0 1]).

Page 13: Graphical Models

Joint Probability Distribution

Random variables

Joint Probability Distribution:

be a realization of Let

Also written

Gives a real value for all possible assignments.

Page 14: Graphical Models

Queries

Joint Probability Distribution:

Also written

Given a joint distribution, we can reason about unobserved variables given observations (evidence):

Stuff you care about Stuff you already know

Page 15: Graphical Models

Representation

One way to represent the joint probability distribution for discrete is as an n-dimensional table, each cell containing the probability for a setting of X. This would have entries if each ranges over values.

Joint Probability Distribution:

Also written

Page 16: Graphical Models

Representation

Graphical Models!

Joint Probability Distribution:

Also written

One way to represent the joint probability distribution for discrete is as an n-dimensional table, each cell containing the probability for a setting of X. This would have entries if each ranges over values.

Page 17: Graphical Models

Representation

Graphical models represent joint probability distributions more economically, using a set of “local” relationships among variables.

Joint Probability Distribution:

Also written

Page 18: Graphical Models

Graphical ModelsGraphical models offer several useful properties:

1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models.

2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph.

3. Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly.

from Chris Bishop

Page 19: Graphical Models

Main kinds of models

• Undirected (also called Markov Random Fields) - links express constraints between variables.

• Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.

Page 20: Graphical Models

Main kinds of models

• Undirected (also called Markov Random Fields) - links express constraints between variables.

• Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.

Page 21: Graphical Models

Directed Graphical Models

Directed Graph, G = (X,E)

Nodes

Edges

Each node is associated with a random variable

Page 22: Graphical Models

Directed Graphical Models

Directed Graph, G = (X,E)

Nodes

Edges

Each node is associated with a random variable

Definition of joint probability in a graphical model:

where are the parents of

Page 23: Graphical Models

Example

Page 24: Graphical Models

Example

Joint Probability:

Page 25: Graphical Models

Example

00

1

1

00

1

10

0

1

1

00

1

1

10

0

10 1

0

1

Page 26: Graphical Models

Conditional Independence

Independence:

Conditional Independence:

Or,

Page 27: Graphical Models

Conditional Independence

Page 28: Graphical Models

Conditional Independence

By Chain Rule (using the usual arithmetic ordering)

Page 29: Graphical Models

Example

Joint Probability:

Page 30: Graphical Models

Conditional Independence

By Chain Rule (using the usual arithmetic ordering)

Joint distribution from the example graph:

Page 31: Graphical Models

Conditional Independence

By Chain Rule (using the usual arithmetic ordering)

Missing variables in the local conditional probability functions correspond to missing edges in the underlying graph.

Removing an edge into node i eliminates an argument from the conditional probability factor

Joint distribution from the example graph:

Page 32: Graphical Models

Observations

• Graphs can have observed (shaded) and unobserved nodes. If nodes are always unobserved they are called hidden or latent variables

• Probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes (the “hidden” or “unobserved” nodes), given the values of other nodes (the “evidence” or “observed” nodes).

Page 33: Graphical Models

Inference – computing conditional probabilities

Marginalization:Conditional Probabilities:

Page 34: Graphical Models

Inference Algorithms• Exact algorithms

– Elimination algorithm– Sum-product algorithm– Junction tree algorithm

• Sampling algorithms– Importance sampling– Markov chain Monte Carlo

• Variational algorithms– Mean field methods– Sum-product algorithm and variations– Semidefinite relaxations

Page 35: Graphical Models

Talk Outline

1. Quick introduction to graphical models

2. Examples: - Naïve Bayes, pLSA, LDA

Page 36: Graphical Models

A Simple Example – Naïve Bayes

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Page 37: Graphical Models

A Simple Example – Naïve Bayes

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Page 38: Graphical Models

A Simple Example – Naïve Bayes

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Page 39: Graphical Models

Slide from Dan Klein

Page 40: Graphical Models

Slide from Dan Klein

Page 41: Graphical Models

Slide from Dan Klein

Page 42: Graphical Models

Percentage of documents in training set labeled as spam/ham

Slide from Dan Klein

Page 43: Graphical Models

In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words).

Slide from Dan Klein

Page 44: Graphical Models

In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words).

Slide from Dan Klein

Page 45: Graphical Models

Classification

The class that maximizes:

Page 46: Graphical Models

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow

Page 47: Graphical Models

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.

Page 48: Graphical Models

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with

the highest score does not change.

Page 49: Graphical Models

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with

the highest score does not change.– So, what we usually compute in practice is:

Page 50: Graphical Models

Naïve Bayes for modeling text/metadata topics

Page 51: Graphical Models

Harvesting Image Databases from the WebSchroff, F. , Criminisi, A. and Zisserman, A.

Download images from the web via a search query (e.g. penguin).

Re-rank images using a naïve Bayes model trained on text surrounding the images and meta-data features (image alt tag, image title tag, image filename).

Top ranked images used to train an SVM classifier to further improve ranking.

Page 52: Graphical Models

Results

Page 53: Graphical Models

Results

Page 54: Graphical Models

Naïve Bayes on images

Page 55: Graphical Models

Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray

Page 56: Graphical Models

Method

Steps:– Detect and describe image patches– Assign patch descriptors to a set of predetermined

clusters (a vocabulary) with a vector quantization algorithm

– Construct a bag of keypoints, which counts the number of patches assigned to each cluster

– Apply a multi-class classifier (naïve Bayes), treating the bag of keypoints as the feature vector, and thus determine which category or categories to assign to the image.

Page 57: Graphical Models

Naïve Bayes

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Page 58: Graphical Models

Naive Bayes Parameters

Problem: Categorize images as one of 7 object classes using Naïve Bayes classifier:– Classes: object categories (face, car, bicycle, etc)– Features – Images represented as a histogram where

bins are the cluster centers or visual word vocabulary. Features are vocabulary counts.

treated as uniform. learned from training data – images labeled with

category.

Page 59: Graphical Models

Results

Page 60: Graphical Models

Talk Outline

1. Quick introduction to graphical models

2. Examples: - Naïve Bayes, pLSA, LDA

Page 61: Graphical Models

pLSA

Page 62: Graphical Models

pLSA

Page 63: Graphical Models

Marginalizing over topics determines the conditionalprobability:

Joint Probability:

Page 64: Graphical Models

Fitting the model

Need to:Determine the topic vectors common to all documents.Determine the mixture components specific to each

document.

Goal: a model that gives high probability to the words that appear in the corpus.

Maximum likelihood estimation of the parameters is obtained by maximizing the objective function:

Page 65: Graphical Models

pLSA on images

Page 66: Graphical Models

Discovering objects and their location in imagesJosef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

Documents – ImagesWords – visual words (vector quantized SIFT descriptors)Topics – object categories

Images are modeled as a mixture of topics (objects).

Page 67: Graphical Models

Goals

They investigate three areas: – (i) topic discovery, where categories are

discovered by pLSA clustering on all available images.

– (ii) classification of unseen images, where topics corresponding to object categories are learnt on one set of images, and then used to determine the object categories present in another set.

– (iii) object detection, where you want to determine the location and approximate segmentation of object(s) in each image.

Page 68: Graphical Models

(i) Topic Discovery

Most likely words for 4 learnt topics (face, motorbike, airplane, car)

Page 69: Graphical Models

(ii) Image Classification

Confusion table for unseen test images against pLSA trained on images containing four object categories, but no background images.

Page 70: Graphical Models

(ii) Image Classification

Confusion table for unseen test images against pLSA trained on images containing four object categories, and background images. Performance is not quite as good.

Page 71: Graphical Models

(iii) Topic Segmentation

Page 72: Graphical Models

(iii) Topic Segmentation

Page 73: Graphical Models

(iii) Topic Segmentation

Page 74: Graphical Models

Talk Outline

1. Quick introduction to graphical models

2. Examples: - Naïve Bayes, pLSA, LDA

Page 75: Graphical Models

LDADavid M Blei, Andrew Y Ng, Michael Jordan

Page 76: Graphical Models

LDA

Per-document topic proportions

Per-word topic assignment

Observed word

Page 77: Graphical Models

LDApLSA

Per-document topic proportions

Per-word topic assignment

Observed word

Page 78: Graphical Models

LDA

Per-document topic proportions

Per-word topic assignment

Observed wordDirichlet parameter

Page 79: Graphical Models

LDA

Dirichlet parameter

Per-document topic proportions

Per-word topic assignment

Observed word

topics

Page 80: Graphical Models

Generating Documents

Page 81: Graphical Models

Joint Distribution

Joint distribution:

Page 82: Graphical Models
Page 83: Graphical Models

LDA on text

Topic discovery from a text corpus. Highly ranked words for 4 topics.

Page 84: Graphical Models

LDA in Animals on the Web

Tamara L Berg, David Forsyth

Page 85: Graphical Models

Animals on the Web Outline:

Harvest pictures of animals from the web using Google Text Search.

Select visual exemplars using text based information LDA

Use vision and text cues to extend to similar images.

Page 86: Graphical Models

Text Model Latent Dirichlet Allocation (LDA) on the words in collected

web pages to discover 10 latent topics for each category. Each topic defines a distribution over words. Select the 50

most likely words for each topic.

1.) frog frogs water tree toad leopard green southern music king irish eggs folk princess river ball range eyes game species legs golden bullfrog session head spring book deep spotted de am free mouse information round poison yellow upon collection nature paper pond re lived center talk buy arrow common prince

Example Frog Topics:

2.) frog information january links common red transparent music king water hop tree pictures pond green people available book call press toad funny pottery toads section eggs bullet photo nature march movies commercial november re clear eyed survey link news boston list frogs bull sites butterfly court legs type dot blue

Page 87: Graphical Models

Select ExemplarsRank images according to whether they have these likely

words near the image in the associated page.

Select up to 30 images per topic as exemplars.2.) frog information january links common red transparent music king water hop tree pictures pond green people available book call press ...

1.) frog frogs water tree toad leopard green southern music king irish eggs folk princess river ball range eyes game species legs golden bullfrog session head ...

Page 88: Graphical Models

Extensions to LDA for pictures

Page 89: Graphical Models

A Bayesian Hierarchical Model for Learning Natural Scene Categories

Fei-Fei Li, Pietro Perona

An unsupervised approach to learn and recognize natural scene categories. A scene is represented by a collection of local regions. Each region is represented as part of a “theme” (e.g. rock, grass etc) learned from data.

Page 90: Graphical Models

Generating Scenes

1.) Choose a category label (e.g. mountain scene).

2.) Given the mountain class, draw a probability vector that will determine what intermediate theme(s) (grass rock etc) to select while generating each patch of the scene.

3.) For creating each patch in the image, first determine a particular theme out of the mixture of possible themes, and then draw a codeword given this theme. For example, if a “rock” theme is selected, this will in turn privilege some codewords that occur more frequently in rocks (e.g. slanted lines).

4.) Repeat the process of drawing both the theme and codeword many times, eventually forming an entire bag of patches that would construct a scene of mountains.

Page 91: Graphical Models

Results – modeling themes

Left - distribution of the 40 intermediate themes. Right - distribution of codewords as well as the appearance of 10 codewords selected from the top 20 most likely codewords for this category model.

Page 92: Graphical Models

Results – modeling themes

Left - distribution of the 40 intermediate themes. Right - distribution of codewords as well as the appearance of 10 codewords selected from the top 20 most likely codewords for this category model.

Page 93: Graphical Models

Results – Scene Classification

correct incorrect

Page 94: Graphical Models

Results – Scene Classification

correct incorrect