Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Department of Computer ScienceCSCI 5622: Machine Learning
Chenhao TanLecture 19: EM algorithm, Topic modeling
Slides adapted from Jordan Boyd-Graber, Chris Ketelsen
1
Administrivia
• HW4 due, HW5 out• Remember that we only count the highest 4 homework scores
• Final project midpoint presentation• For the final project, each person will be asked to summarize what
everyone in the team did• Contact information for printing
2
Second Month Survey
3
Second survey
First survey
Second Month Survey
• Conflicting opinions• wide variety of models, good explanations, good homeworks• Clarity of HW grading is the worst I have ever had for a class.• Depth of content covered• Course is too theory heavy
• I liked that the instructor not only requested feedback often, but also acted upon the feedback, changing a few things about how the class and slides are presented.
4
Second Month Survey
• Increase exam duration• The professor needs to slow down, and sacrifice some of the
math subtleties and complexities in favor of concrete understanding of the topics.
• Go into the weeds of the math less
5
Learning Objectives
• Learn about Expectation-Maximization algorithm
• Learn about latent Dirichlet allocation
6
Gaussian Mixture Models
7
Gaussian Mixture Models
8
Gaussian Mixture Models
9
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−4−2
02
4
x1
x2●
●
Gaussian Mixture Models
10
Gaussian Mixture Models
11
Gaussian Mixture Models
12
Gaussian Mixture Models
13
Gaussian Mixture Models
14
Gaussian Mixture Models
15
Gaussian Mixture Models
16
Gaussian Mixture Models
17
Latent Variables
• z’s correspond to the latent structure that we try to learn in unsupervised learning
• From a modeling perspective, they are usually referred to as latent variables
18
EM Algorithm
19
EM Algorithm
20
EM Algorithm
21
EM Algorithm
22
EM Algorithm
• EM stands for Expectation-Maximization• A classic algorithm in Dempster, Laird, Rubin, 1977• An iterative method
23
EM Algorithm
24
EM Algorithm
25
EM Algorithm
26
EM Algorithm
27
EM Algorithm
28
EM Algorithm
29
30
EM Algorithm
31
EM Algorithm
32
EM Algorithm
33
EM Algorithm
34
EM Algorithm
35
EM Algorithm
GMM and K-means
36
GMMs and the EM algorithm
• GMMs with the EM Algorithm suffer from some of the same problems as K-Means
• Doesn't really work with categorical data• Usually only converges to a local minimum• Have to determine the number of clusters• Only generates convex clusters
• But, it also has certain advantages• The clusters are allowed different shapes• We get a soft partitioning of the data
37
Topic models
• Discrete count data
38
Topic models
39
• Suppose you have a huge number of documents
• Want to know what's going on• Can't read them all (e.g. every New
York Times article from the 90's)• Topic models offer a way to get a
corpus-level view of major themes• Unsupervised
Conceptual approach
• Input: a text corpus and number of topics K • Output:
• K topics, each topic is a list of words• Topic assignment for each document
40
Forget the Bootleg, Just Download the Movie LegallyMultiplex Heralded As
Linchpin To GrowthThe Shape of Cinema, Transformed At the Click of
a MouseA Peaceful Crew Puts
Muppets Where Its Mouth IsStock Trades: A Better Deal For Investors Isn't SimpleThe three big Internet portals begin to distinguish
among themselves as shopping malls
Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens
Corpus
Conceptual approach
41
computer,
technology,
system,
service, site,
phone,
internet,
machine
play, film,
movie, theater,
production,
star, director,
stage
sell, sale,
store, product,
business,
advertising,
market,
consumer
TOPIC 1 TOPIC 2 TOPIC 3
• K topics, each topic is a list of words
Conceptual approach
42
• Topic assignment for each document
Forget the Bootleg, Just Download the Movie Legally
Multiplex Heralded As Linchpin To
GrowthThe Shape of
Cinema, Transformed At the Click of a
Mouse A Peaceful Crew Puts Muppets
Where Its Mouth Is
Stock Trades: A Better Deal For Investors Isn't
Simple
Internet portals begin to distinguish among themselves as shopping malls
Red Light, Green Light: A
2-Tone L.E.D. to Simplify Screens
TOPIC 2"BUSINESS"
TOPIC 3"ENTERTAINMENT"
TOPIC 1"TECHNOLOGY"
Topics from Science
43
Why should you care?
• Neat way to explore/understand corpus collections• E-discovery• Social media• Scientific data
• NLP Applications• Word sense disambiguation• Discourse segmentation
• Psychology: word meaning, polysemy• A general way to model count data and a general inference
algorithm44
Topic models
• Discrete count data• Gaussian distributions are not appropriate
45
Generative model: Latent Dirichlet Allocation• Generate a document, or a bag of words• Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003.
46
Generative model: Latent Dirichlet Allocation• Generate a document, or a bag
of words• Multinomial distribution
• Distribution over discrete outcomes
• Represented by non-negative vector that sums to one
• Picture representation
47
(1,0,0) (0,0,1)
(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)
(0,1,0)
Generative model: Latent Dirichlet Allocation• Generate a document, or a bag
of words• Multinomial distribution
• Distribution over discrete outcomes
• Represented by non-negative vector that sums to one
• Picture representation• Come from a Dirichlet distribution
48
(1,0,0) (0,0,1)
(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)
(0,1,0)
Generative story
49
computer,
technology,
system,
service, site,
phone,
internet,
machine
play, film,
movie, theater,
production,
star, director,
stage
sell, sale,
store, product,
business,
advertising,
market,
consumer
TOPIC 1
TOPIC 2
TOPIC 3
Generative story
50
Forget the Bootleg, Just Download the Movie Legally
Multiplex Heralded As Linchpin To Growth
The Shape of Cinema, Transformed At the Click of
a Mouse
A Peaceful Crew Puts Muppets Where Its Mouth Is
Stock Trades: A Better Deal For Investors Isn't Simple
The three big Internet portals begin to distinguish
among themselves as shopping mallsRed Light, Green Light: A
2-Tone L.E.D. to Simplify Screens
TOPIC 2
TOPIC 3
TOPIC 1
Generative story
51
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Generative story
52
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Generative story
53
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Generative story
54
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Missing component: how to generate a multinomial distribution
55
Missing component: how to generate a multinomial distribution
56
What are Topic Models?
Dirichlet Distribution
Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 12 of 26
Missing component: how to generate a multinomial distribution
57
Conjugacy of Dirichlet and Multinomial
58
What are Topic Models?
Dirichlet Distribution
• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then
p(�|↵,w) / p(w |�)p(�|↵) (1)
/Y
k
�nkY
k
�↵k�1 (2)
/Y
k
�↵k+nk�1 (3)
• Conjugacy: this posterior has the same form as the prior
Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 14 of 26
Conjugacy of Dirichlet and Multinomial
59
What are Topic Models?
Dirichlet Distribution
• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then
p(�|↵,w) / p(w |�)p(�|↵) (1)
/Y
k
�nkY
k
�↵k�1 (2)
/Y
k
�↵k+nk�1 (3)
• Conjugacy: this posterior has the same form as the prior
Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 14 of 26
Making the generative story formal
60
What are Topic Models?
Generative Model Approach
MNθd zn wn
Kβk
α
λ
• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �
• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵
• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.
• Choose the observed word wn from the distribution �zn .
Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26
61
What are Topic Models?
Generative Model Approach
MNθd zn wn
Kβk
α
λ
• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �
• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵
• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.
• Choose the observed word wn from the distribution �zn .
Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26
Making the generative story formal
62
Making the generative story formalWhat are Topic Models?
Generative Model Approach
MNθd zn wn
Kβk
α
λ
• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �
• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵
• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.
• Choose the observed word wn from the distribution �zn .
Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26
63
Making the generative story formal
What are Topic Models?
Generative Model Approach
MNθd zn wn
Kβk
α
λ
• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �
• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵
• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.
• Choose the observed word wn from the distribution �zn .Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26
Topic models: What’s important
• Topic models (latent variables)• Topics to word types—multinomial distribution• Documents to topics—multinomial distribution
• Modeling & Algorithm• Model: story of how your data came to be• Latent variables: missing pieces of your story• Statistical inference: filling in those missing pieces (next lecture)
• We use latent Dirichlet allocation (LDA), a fully Bayesian version of pLSI, probabilistic version of LSA
64
Recap
• Expectation-maximization: a general algorithm for mixture models
• Topic models: a neat way to model discrete count data
65