View
3
Download
0
Category
Preview:
Citation preview
Latent Dirichlet Allocation(LDA)
Seonghwi Kim
SDS Lab Seminar
Pohang University of Science and TechnologyDepartment of Industrial and Management Engineering
July 15, 2020
1 / 42
Outline
1 Introduction
2 LDA model
3 Approximate posterior inference
4 Application: SCNT recommendation system
2 / 42
Outline
1 Introduction
2 LDA model
3 Approximate posterior inference
4 Application: SCNT recommendation system
3 / 42
Topic modeling
• A statistical model for discovering the abstract topics thatoccur in a collection of documents.
• A methods for automatically organizing, understanding,searching, and summarizing large documents.
4 / 42
Topic modeling
• Uncover the hidden topical patterns that pervade thecollection of documents.(corpus)
5 / 42
Topic modeling
From a machine learning perspective, topic modeling is a casestudy in applying hierarchical Bayesian models to groupeddata, like documents or images.
Topic modeling research touches on• Directed graphical models• Conjugate priors• Hierarchical Bayesian methods• Fast approximate posterior inference (MCMC, variational
methods)• ...
LDA is an example of topic model.
6 / 42
Document embedding
• Document embedding is to convert each document to avector space representation.
• It enables us to perform several tasks relevant todocuments like
– calculating similarity between documents– document classification
7 / 42
Bag of Words Representation
• A common representation of documents in naturallanguage processing
• A document is represented as the bag (multiset) of itswords, disregarding grammar and any meaning for word.
• The order of words in a document is ignored, and only thefrequency of the word matters.
• Latent Dirichlet Allocation assume the bag of wordsrepresentation for documents. (bag of words model)
8 / 42
Bag of Words RepresentationHere are two simple text documents:
• (1) John likes to watch movies. Mary likes movies too.• (2) Mary also likes to watch football games.
Based on these two text documents, a list is constructed as follows foreach document:
[John:1, likes:2, to:1, watch:1, movies:2, Mary:1, too:1][Mary:1, also:1, likes:1, to:1, watch:1, football:1, games:1]
and a union list of these two
[John, likes, to, watch, movies, Mary, too, also, football, games]
then for each documents (1) and (2):• (1) [1, 2, 1, 1, 2, 1, 1, 0, 0, 0]• (2) [0, 1, 1, 1, 0, 1, 0, 1, 1, 1]
9 / 42
Outline
1 Introduction
2 LDA model
3 Approximate posterior inference
4 Application: SCNT recommendation system
10 / 42
Generative model
• Each document is a random mixture of topics• Each word is drawn from one of those topics
11 / 42
The posterior distribution
• In reality, we only observe the documents• Our goal is to infer the underlying topic structure
12 / 42
Graphical models
• Nodes are random variables• Edges denote possible dependence• Observed variables are shaded• Plates denote replicated structure
13 / 42
Graphical models
• Structure of the graph represents a relationship betweenrandom variables
• E.g., this graph corresponds to
14 / 42
Latent Dirichlet Allocation
βk ∼ Dirichlet(η), k ∈ {1, 2, ...,K}θd ∼ Dirichlet(α), d ∈ {1, 2, ...,D}
zd,n ∼ Multi(θd), d ∈ {1, 2, ...,D},n ∈ {1, 2, ...,N}wd,n ∼ Multi(βzd,n), d ∈ {1, 2, ...,D},n ∈ {1, 2, ...,N}
15 / 42
Latent Dirichlet Allocation
16 / 42
Latent Dirichlet Allocation
Here, the complete joint probability distribution of LDA:
p(θ, z,w, β | α, η) =K∏
k=1p(βk | η)
D∏d=1
p(θd | α)N∏
n=1p(zd,n | θd)p(wd,n | βzd,n)
, where
p(zd,n | θd) = θd,zd,n ,p(wd,n | zd,n, β) = βwd,n,zd,n
17 / 42
The Dirichlet distribution
• The Dirichlet distribution is an exponential familydistribution over the simplex, i.e., positive vectors that sumto one
p(θ | α⃗) = Γ(Σiαi)∏i Γ(αi)
∏iθαi−1
• The Dirichlet is conjugate to the multinomial. Given amultinomial observation, the posterior distribution of θ is aDirichlet.
• The parameter α controls the mean shape and sparsity of θ.• The topic proportions are a K dimensional Dirichlet. The
topics are a V dimensional Dirichlet.
18 / 42
The Dirichlet distribution
Changes of θ distributions with different α values
• Large α values make the distribution to be peaky, while smallerα values push the distribution to the corners.
• α values determine smoothness or sparsity of the θdistributions.
From Geanegedara, Thushan. 2018. ”Intuitive Guide to Latent Dirichlet Allocation” 19 / 42
The Dirichlet distribution
20 / 42
Latent Dirichlet Allocation
• From a collection of documents, infer1 Per-word topic assignment zd,n2 Per-document topic proportions θd3 Per-corpus topic distributions βk
• Approximate posterior inference algorithms1 Gibbs sampling2 Variational inference
21 / 42
Outline
1 Introduction
2 LDA model
3 Approximate posterior inference
4 Application: SCNT recommendation system
22 / 42
Posterior distribution for LDA
• For now, assume the topics β1:K are fixedThe per-document posterior is
p(θ, z | w, α, β1:K) =p(θ, z,w | α, β1:K)
p(w | α, β1:K), where
p(w | α, β1:K) =∫θ
p(θ | α)N∏
n=1ΣKz=1p(zn | θ)p(wn | zn, β1:K)
• This is intractable to compute• We appeal to approximate posterior inference.
23 / 42
Gibbs sampling
• MCMC algorithm for obtaining a sequence of observations whichare approximated from a specified multivariate probabilitydistribution
• Define a Markov chain whose stationary distribution is theposterior of interest
• Collect independent samples from that stationary distribution;approximate the posterior with them
• In Gibbs sampling, The chain is run by iteratively sampling fromthe conditional distribution of each hidden variable givenobservations and the current state of the other hidden variables
24 / 42
Gibbs sampling procedure
Suppose the joint probability distribution p(x1, x2, x3) of threerandom variables.
1 Initialize X0 = (x01, x02, x03)2 Fix the variables x02 and x03 of the currently given sample X0.3 The new value x11 to replace x01 is selected with the following
probability. p(x11 | x02, x03)4 Fix the variables x11 and x03.5 The new x11 to replace x02 is selected with the following probability.
p(x12 | x11, x03)6 Fix the variables x11 and x12.7 The new value x13 to replace x03 is selected with the following
probability. p(x13 | x11, x12)8 Finally, the obtained X1 = (x11, x12, x13)
25 / 42
Gibbs sampling procedure
Visualization of Gibbs sampling
26 / 42
Gibbs sampling for LDA
• Define n(z1:N) to be the counts vector.• A collapsed Gibbs sampler is
zi | z−i,w1:N ∼ Multi(π(z−i,wi)), where
π(z−i,wi) ∝ (α+ n(z1:N))p(wi | β1:K)
27 / 42
Gibbs sampling for LDA
• The topic proportions θ can be integrated out.• A collapsed Gibbs sampler draws from
p(zi | z−i,w1:N) ∝ p(wi | β1:K)K∏
k=1Γ(nk(z−i)),
where nk(z−1) is the number of times we’ve seen topic k inthe collection of topic assignments zi
• Integrating out variables leads to a faster mixing chain.28 / 42
Gibbs sampling for LDA
• zi: the topic assigned to the ithword record• z−i: topics assigned to the other words• In this example, n(z−i) = (9, 4, 6)
29 / 42
Latent Dirichlet Allocation
30 / 42
Outline
1 Introduction
2 LDA model
3 Approximate posterior inference
4 Application: SCNT recommendation system
31 / 42
SCNT recommendation systemDevelopment of AI-based Recommendation System forCurated Retailing Services in Samsung C&T
Outfit recommendation system based user’s click history• topic : style• document : user• words frequency : item click frequency• words : items
32 / 42
SCNT recommendation system
• Recommendation process
33 / 42
SCNT recommendation system
• Preprocessing
34 / 42
SCNT recommendation system
• Model assessment
35 / 42
SCNT recommendation system
• Click history 2019.03.01 – 2019.03.10• style example
36 / 42
SCNT recommendation system
• Click history 2019.03.01 – 2019.03.10• recommendation example
37 / 42
SCNT recommendation system
• Click history 2019.03.01 – 2019.03.10• recommendation example
38 / 42
SCNT recommendation system
• Click history 2019.06.01 – 2019.06.10• style example
39 / 42
SCNT recommendation system
• Click history 2019.06.01 – 2019.06.10• recommendation example
40 / 42
SCNT recommendation system
• Click history 2019.06.01 – 2019.06.10• recommendation example
41 / 42
Thank you!
42 / 42
IntroductionLDA modelApproximate posterior inferenceApplication: SCNT recommendation system
Recommended