Topic Model Latent Dirichlet Allocation

Preview:

DESCRIPTION

Topic Model Latent Dirichlet Allocation. Ouyang Ruofei. May. 10 2013. Ouyang Ruofei. LDA. Introduction. Parameters:. Inference:. data = latent pattern + noise. Ouyang Ruofei. LDA. Introduction. Parametric Model:. Number of parameters is fixed w.r.t . sample size. - PowerPoint PPT Presentation

Citation preview

Ouyang Ruofei

Topic Model Latent Dirichlet Allocation

Ouyang Ruofei

May. 10 2013

LDA

2

Introduction

Ouyang Ruofei LDA

Parameters:

Inference:data = latent pattern + noise

3

Introduction

Ouyang Ruofei LDA

Parametric Model:

Nonparametric Model:

Number of parameters is fixed w.r.t. sample size

Number of parameters grows with sample sizeInfinite dimensional parameter space

Problem ParameterDensity Estimation Distributions

Regression FunctionsClustering Partitions

4

Clustering

Ouyang Ruofei LDA

1.Ironman 2.Thor 3.Hulk

Indicator variable for each data point

5

Dirichlet process

Ouyang Ruofei LDA

Ironman: 3 times Thor: 2 times Hulk: 2 times

Without the likelihood, we know that:

1. There are three clusters

2. The distribution over three clusters

New data

6

Dirichlet process

Ouyang Ruofei LDA

Dirichlet distribution:

pdf:

mean:

Example:

Dir(Ironman,Thor,Hulk)

7

Dirichlet process

Ouyang Ruofei LDA

Dirichlet distribution: Multinomial distribution:

Conjugate prior

Posterior: Example:

Ironman Thor HulkPrior 3 2 2

Likelihood 100 300 200Posterior 103 302 202

Pseudo count

8

Dirichlet process

Ouyang Ruofei LDA

In our Avengers model, K=3 (Ironman, Thor, Hulk)

Dirichlet process:

However, this guy comes…

Dirichlet distribution can’t model this stupid guy

K = infinity

Nonparametrics here mean infinite number of clusters

9

Dirichlet process

Ouyang Ruofei LDA

α: Pseudo counts in each cluster

G0: Base distribution of each cluster

A distribution over distributions

Dirichlet process:

AGiven any partition

Distribution template

10

Dirichlet process

Ouyang Ruofei LDA

Construct Dirichlet process by CRP

In a restaurant, there are infinite number of tables.

Chinese restaurant process:

Costumer 1 seats at an unoccupied table with p=1.

Costumer N seats at table k with p=

11

Dirichlet process

Ouyang Ruofei LDA

12

Dirichlet process

Ouyang Ruofei LDA

13

Dirichlet process

Ouyang Ruofei LDA

14

Dirichlet process

Ouyang Ruofei LDA

15

Dirichlet process

Ouyang Ruofei LDA

Customers : data

Tables : clusters

16

Dirichlet process

Ouyang Ruofei LDA

Train the model by Gibbs sampling

17

Dirichlet process

Ouyang Ruofei LDA

Train the model by Gibbs sampling

18

Gibbs sampling

Ouyang Ruofei LDA

Gibbs sampling is a MCMC method to obtain a sequence of observations from a multivariate distribution

The intuition is to turn a multivariate problem into a sequence of univariate problem.

Multivariate:

Univariate:

In Dirichlet process,

19

Gibbs sampling

Ouyang Ruofei LDA

Gibbs sampling pseudo code:

20

Topic model

Ouyang Ruofei LDA

Document

Mixture of topics

we can read words

Latent variable

But,

topics words

21

Topic model

Ouyang Ruofei LDA

22

Topic model

Ouyang Ruofei LDA

23

Topic model

Ouyang Ruofei LDA

word/topic count topic/doc counttopic of xij

observed wordother topics

other words

24

Topic model

Ouyang Ruofei LDA

Apply Dirichlet process in topic model

Topic 1 Topic 2 Topic 3

Document P1 P2 P3

Topic 1 Topic 2 Topic 3

Word Q1 Q2 Q3

Learn the distribution of topics in a document

Learn the distribution of topics for a word

25

Topic model

Ouyang Ruofei LDA

t1 t2 t3

d1

t1 t2 t3

d2

t1 t2 t3

d3

w1 w2 w3 w4

t1

t2

t3

topic/doc table word/topic table

26

Topic model

Ouyang Ruofei LDA

Latent Dirichlet allocation:

Dirichlet mixture model:

27

LDA Example

Ouyang Ruofei LDA

w: ipad apple itunes mirror queen joker ladygaga

t1: product

t2: storyt3: poker

d1: ipad apple itunes

d2: apple mirror queen

d3: queen joker ladygaga

d4: queen ladygaga mirror

In fact, the topics are latent

28

LDA example

Ouyang Ruofei LDA

d1: ipad apple itunes

d2: apple mirror queen

d3: queen joker ladygaga

d4: queen ladygaga mirror

ipad apple itunes mirror queen joker ladygaga

t1 1 1 2

t2 2 1 2

t3 1 1 1

sum 1 2 1 2 3 1 2

t1 t2 t3

d1 1 1 1

d2 1 2 0

d3 1 0 2

d4 1 2 0

1 2 3

2 1 2

3 3 1

2 1 2

29

LDA example

Ouyang Ruofei LDA

d1: ipad apple itunes

d2: apple mirror queen

d3: joker ladygaga

d4: queen ladygaga mirror

ipad apple itunes mirror queen joker ladygaga

t1 1 1 2

t2 2 1 2

t3 1 1 1

sum 1 2 1 2 3 1 2

t1 t2 t3

d1 1 1 1

d2 1 2 0

d3 1 0 2

d4 1 2 0

1 2 3

2 1 2

3 1

2 1 2

queen

30

LDA example

Ouyang Ruofei LDA

d1: ipad apple itunes

d2: apple mirror queen

d3: joker ladygaga

d4: queen ladygaga mirror

ipad apple itunes mirror queen joker ladygaga

t1 1 1 2

t2 2 1 2

t3 1 1-1 1

sum 1 2 1 2 3-1 1 2

t1 t2 t3

d1 1 1 1

d2 1 2 0

d3 1 0 2-1

d4 1 2 0

1 2 3

2 1 2

3 1

2 1 2

queen

31

LDA example

Ouyang Ruofei LDA

d1: ipad apple itunes

d2: apple mirror queen

d3: joker ladygaga

d4: queen ladygaga mirror

ipad apple itunes mirror queen joker ladygaga

t1 1 1 2

t2 2 1 2

t3 1 0 1

sum 1 2 1 2 2 1 2

t1 t2 t3

d1 1 1 1

d2 1 2 0

d3 1 0 1

d4 1 2 0

1 2 3

2 1 2

3 1

2 1 2

queen

32

LDA example

Ouyang Ruofei LDA

d1: ipad apple itunes

d2: apple mirror queen

d3: joker ladygaga

d4: queen ladygaga mirror

ipad apple itunes mirror queen joker ladygaga

t1 1 1 2

t2 2 1 2+1

t3 1 0 1

sum 1 2 1 2 2+1 1 2

t1 t2 t3

d1 1 1 1

d2 1 2 0

d3 1 0+1 1

d4 1 2 0

1 2 3

2 1 2

3 1

2 1 2

queen2

33

Further

Ouyang Ruofei LDA

Dirichlet distribution prior: K topics

Alpha mainly controls the probability of a topic with few training data in the document.

Dirichlet process prior: infinite topics

Beta mainly controls the probability of a topic with few training data in the words.

Supervised

Unsupervised

34

Further

Ouyang Ruofei LDA

Unrealistic bag of words assumption

Lose power law behavior

TNG, biLDA

Pitman Yor language model

David Blei has done an extensive survey on topic modelhttp://home.etf.rs/~bfurlan/publications/SURVEY-1.pdf

Q&A

Ouyang Ruofei LDA

Recommended