Upload
leane
View
123
Download
0
Embed Size (px)
DESCRIPTION
Topic Model Latent Dirichlet Allocation. Ouyang Ruofei. May. 10 2013. Ouyang Ruofei. LDA. Introduction. Parameters:. Inference:. data = latent pattern + noise. Ouyang Ruofei. LDA. Introduction. Parametric Model:. Number of parameters is fixed w.r.t . sample size. - PowerPoint PPT Presentation
Citation preview
Ouyang Ruofei
Topic Model Latent Dirichlet Allocation
Ouyang Ruofei
May. 10 2013
LDA
2
Introduction
Ouyang Ruofei LDA
Parameters:
Inference:data = latent pattern + noise
3
Introduction
Ouyang Ruofei LDA
Parametric Model:
Nonparametric Model:
Number of parameters is fixed w.r.t. sample size
Number of parameters grows with sample sizeInfinite dimensional parameter space
Problem ParameterDensity Estimation Distributions
Regression FunctionsClustering Partitions
4
Clustering
Ouyang Ruofei LDA
1.Ironman 2.Thor 3.Hulk
Indicator variable for each data point
5
Dirichlet process
Ouyang Ruofei LDA
Ironman: 3 times Thor: 2 times Hulk: 2 times
Without the likelihood, we know that:
1. There are three clusters
2. The distribution over three clusters
New data
6
Dirichlet process
Ouyang Ruofei LDA
Dirichlet distribution:
pdf:
mean:
Example:
Dir(Ironman,Thor,Hulk)
7
Dirichlet process
Ouyang Ruofei LDA
Dirichlet distribution: Multinomial distribution:
Conjugate prior
Posterior: Example:
Ironman Thor HulkPrior 3 2 2
Likelihood 100 300 200Posterior 103 302 202
Pseudo count
8
Dirichlet process
Ouyang Ruofei LDA
In our Avengers model, K=3 (Ironman, Thor, Hulk)
Dirichlet process:
However, this guy comes…
Dirichlet distribution can’t model this stupid guy
K = infinity
Nonparametrics here mean infinite number of clusters
9
Dirichlet process
Ouyang Ruofei LDA
α: Pseudo counts in each cluster
G0: Base distribution of each cluster
A distribution over distributions
Dirichlet process:
AGiven any partition
Distribution template
10
Dirichlet process
Ouyang Ruofei LDA
Construct Dirichlet process by CRP
In a restaurant, there are infinite number of tables.
Chinese restaurant process:
Costumer 1 seats at an unoccupied table with p=1.
Costumer N seats at table k with p=
11
Dirichlet process
Ouyang Ruofei LDA
12
Dirichlet process
Ouyang Ruofei LDA
13
Dirichlet process
Ouyang Ruofei LDA
14
Dirichlet process
Ouyang Ruofei LDA
15
Dirichlet process
Ouyang Ruofei LDA
Customers : data
Tables : clusters
16
Dirichlet process
Ouyang Ruofei LDA
Train the model by Gibbs sampling
17
Dirichlet process
Ouyang Ruofei LDA
Train the model by Gibbs sampling
18
Gibbs sampling
Ouyang Ruofei LDA
Gibbs sampling is a MCMC method to obtain a sequence of observations from a multivariate distribution
The intuition is to turn a multivariate problem into a sequence of univariate problem.
Multivariate:
Univariate:
In Dirichlet process,
19
Gibbs sampling
Ouyang Ruofei LDA
Gibbs sampling pseudo code:
20
Topic model
Ouyang Ruofei LDA
Document
Mixture of topics
we can read words
Latent variable
But,
topics words
21
Topic model
Ouyang Ruofei LDA
22
Topic model
Ouyang Ruofei LDA
23
Topic model
Ouyang Ruofei LDA
word/topic count topic/doc counttopic of xij
observed wordother topics
other words
24
Topic model
Ouyang Ruofei LDA
Apply Dirichlet process in topic model
Topic 1 Topic 2 Topic 3
Document P1 P2 P3
Topic 1 Topic 2 Topic 3
Word Q1 Q2 Q3
Learn the distribution of topics in a document
Learn the distribution of topics for a word
25
Topic model
Ouyang Ruofei LDA
t1 t2 t3
d1
t1 t2 t3
d2
t1 t2 t3
d3
w1 w2 w3 w4
t1
t2
t3
topic/doc table word/topic table
26
Topic model
Ouyang Ruofei LDA
Latent Dirichlet allocation:
Dirichlet mixture model:
27
LDA Example
Ouyang Ruofei LDA
w: ipad apple itunes mirror queen joker ladygaga
t1: product
t2: storyt3: poker
d1: ipad apple itunes
d2: apple mirror queen
d3: queen joker ladygaga
d4: queen ladygaga mirror
In fact, the topics are latent
28
LDA example
Ouyang Ruofei LDA
d1: ipad apple itunes
d2: apple mirror queen
d3: queen joker ladygaga
d4: queen ladygaga mirror
ipad apple itunes mirror queen joker ladygaga
t1 1 1 2
t2 2 1 2
t3 1 1 1
sum 1 2 1 2 3 1 2
t1 t2 t3
d1 1 1 1
d2 1 2 0
d3 1 0 2
d4 1 2 0
1 2 3
2 1 2
3 3 1
2 1 2
29
LDA example
Ouyang Ruofei LDA
d1: ipad apple itunes
d2: apple mirror queen
d3: joker ladygaga
d4: queen ladygaga mirror
ipad apple itunes mirror queen joker ladygaga
t1 1 1 2
t2 2 1 2
t3 1 1 1
sum 1 2 1 2 3 1 2
t1 t2 t3
d1 1 1 1
d2 1 2 0
d3 1 0 2
d4 1 2 0
1 2 3
2 1 2
3 1
2 1 2
queen
30
LDA example
Ouyang Ruofei LDA
d1: ipad apple itunes
d2: apple mirror queen
d3: joker ladygaga
d4: queen ladygaga mirror
ipad apple itunes mirror queen joker ladygaga
t1 1 1 2
t2 2 1 2
t3 1 1-1 1
sum 1 2 1 2 3-1 1 2
t1 t2 t3
d1 1 1 1
d2 1 2 0
d3 1 0 2-1
d4 1 2 0
1 2 3
2 1 2
3 1
2 1 2
queen
31
LDA example
Ouyang Ruofei LDA
d1: ipad apple itunes
d2: apple mirror queen
d3: joker ladygaga
d4: queen ladygaga mirror
ipad apple itunes mirror queen joker ladygaga
t1 1 1 2
t2 2 1 2
t3 1 0 1
sum 1 2 1 2 2 1 2
t1 t2 t3
d1 1 1 1
d2 1 2 0
d3 1 0 1
d4 1 2 0
1 2 3
2 1 2
3 1
2 1 2
queen
32
LDA example
Ouyang Ruofei LDA
d1: ipad apple itunes
d2: apple mirror queen
d3: joker ladygaga
d4: queen ladygaga mirror
ipad apple itunes mirror queen joker ladygaga
t1 1 1 2
t2 2 1 2+1
t3 1 0 1
sum 1 2 1 2 2+1 1 2
t1 t2 t3
d1 1 1 1
d2 1 2 0
d3 1 0+1 1
d4 1 2 0
1 2 3
2 1 2
3 1
2 1 2
queen2
33
Further
Ouyang Ruofei LDA
Dirichlet distribution prior: K topics
Alpha mainly controls the probability of a topic with few training data in the document.
Dirichlet process prior: infinite topics
Beta mainly controls the probability of a topic with few training data in the words.
Supervised
Unsupervised
34
Further
Ouyang Ruofei LDA
Unrealistic bag of words assumption
Lose power law behavior
TNG, biLDA
Pitman Yor language model
David Blei has done an extensive survey on topic modelhttp://home.etf.rs/~bfurlan/publications/SURVEY-1.pdf
Q&A
Ouyang Ruofei LDA