65
Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Department of Computer ScienceCSCI 5622: Machine Learning

Chenhao TanLecture 19: EM algorithm, Topic modeling

Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

Page 2: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Administrivia

• HW4 due, HW5 out• Remember that we only count the highest 4 homework scores

• Final project midpoint presentation• For the final project, each person will be asked to summarize what

everyone in the team did• Contact information for printing

2

Page 3: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Second Month Survey

3

Second survey

First survey

Page 4: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Second Month Survey

• Conflicting opinions• wide variety of models, good explanations, good homeworks• Clarity of HW grading is the worst I have ever had for a class.• Depth of content covered• Course is too theory heavy

• I liked that the instructor not only requested feedback often, but also acted upon the feedback, changing a few things about how the class and slides are presented.

4

Page 5: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Second Month Survey

• Increase exam duration• The professor needs to slow down, and sacrifice some of the

math subtleties and complexities in favor of concrete understanding of the topics.

• Go into the weeds of the math less

5

Page 6: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Learning Objectives

• Learn about Expectation-Maximization algorithm

• Learn about latent Dirichlet allocation

6

Page 7: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

7

Page 8: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

8

Page 9: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

9

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

−4 −2 0 2 4

−4−2

02

4

x1

x2●

Page 10: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

10

Page 11: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

11

Page 12: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

12

Page 13: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

13

Page 14: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

14

Page 15: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

15

Page 16: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

16

Page 17: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Gaussian Mixture Models

17

Page 18: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Latent Variables

• z’s correspond to the latent structure that we try to learn in unsupervised learning

• From a modeling perspective, they are usually referred to as latent variables

18

Page 19: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

19

Page 20: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

20

Page 21: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

21

Page 22: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

22

Page 23: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

• EM stands for Expectation-Maximization• A classic algorithm in Dempster, Laird, Rubin, 1977• An iterative method

23

Page 24: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

24

Page 25: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

25

Page 26: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

26

Page 27: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

27

Page 28: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

28

Page 29: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

EM Algorithm

29

Page 30: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

30

EM Algorithm

Page 31: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

31

EM Algorithm

Page 32: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

32

EM Algorithm

Page 33: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

33

EM Algorithm

Page 34: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

34

EM Algorithm

Page 35: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

35

EM Algorithm

Page 36: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

GMM and K-means

36

Page 37: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

GMMs and the EM algorithm

• GMMs with the EM Algorithm suffer from some of the same problems as K-Means

• Doesn't really work with categorical data• Usually only converges to a local minimum• Have to determine the number of clusters• Only generates convex clusters

• But, it also has certain advantages• The clusters are allowed different shapes• We get a soft partitioning of the data

37

Page 38: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Topic models

• Discrete count data

38

Page 39: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Topic models

39

• Suppose you have a huge number of documents

• Want to know what's going on• Can't read them all (e.g. every New

York Times article from the 90's)• Topic models offer a way to get a

corpus-level view of major themes• Unsupervised

Page 40: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Conceptual approach

• Input: a text corpus and number of topics K • Output:

• K topics, each topic is a list of words• Topic assignment for each document

40

Forget the Bootleg, Just Download the Movie LegallyMultiplex Heralded As

Linchpin To GrowthThe Shape of Cinema, Transformed At the Click of

a MouseA Peaceful Crew Puts

Muppets Where Its Mouth IsStock Trades: A Better Deal For Investors Isn't SimpleThe three big Internet portals begin to distinguish

among themselves as shopping malls

Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

Corpus

Page 41: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Conceptual approach

41

computer,

technology,

system,

service, site,

phone,

internet,

machine

play, film,

movie, theater,

production,

star, director,

stage

sell, sale,

store, product,

business,

advertising,

market,

consumer

TOPIC 1 TOPIC 2 TOPIC 3

• K topics, each topic is a list of words

Page 42: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Conceptual approach

42

• Topic assignment for each document

Forget the Bootleg, Just Download the Movie Legally

Multiplex Heralded As Linchpin To

GrowthThe Shape of

Cinema, Transformed At the Click of a

Mouse A Peaceful Crew Puts Muppets

Where Its Mouth Is

Stock Trades: A Better Deal For Investors Isn't

Simple

Internet portals begin to distinguish among themselves as shopping malls

Red Light, Green Light: A

2-Tone L.E.D. to Simplify Screens

TOPIC 2"BUSINESS"

TOPIC 3"ENTERTAINMENT"

TOPIC 1"TECHNOLOGY"

Page 43: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Topics from Science

43

Page 44: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Why should you care?

• Neat way to explore/understand corpus collections• E-discovery• Social media• Scientific data

• NLP Applications• Word sense disambiguation• Discourse segmentation

• Psychology: word meaning, polysemy• A general way to model count data and a general inference

algorithm44

Page 45: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Topic models

• Discrete count data• Gaussian distributions are not appropriate

45

Page 46: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag of words• Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003.

46

Page 47: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag

of words• Multinomial distribution

• Distribution over discrete outcomes

• Represented by non-negative vector that sums to one

• Picture representation

47

(1,0,0) (0,0,1)

(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)

(0,1,0)

Page 48: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag

of words• Multinomial distribution

• Distribution over discrete outcomes

• Represented by non-negative vector that sums to one

• Picture representation• Come from a Dirichlet distribution

48

(1,0,0) (0,0,1)

(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)

(0,1,0)

Page 49: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative story

49

computer,

technology,

system,

service, site,

phone,

internet,

machine

play, film,

movie, theater,

production,

star, director,

stage

sell, sale,

store, product,

business,

advertising,

market,

consumer

TOPIC 1

TOPIC 2

TOPIC 3

Page 50: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative story

50

Forget the Bootleg, Just Download the Movie Legally

Multiplex Heralded As Linchpin To Growth

The Shape of Cinema, Transformed At the Click of

a Mouse

A Peaceful Crew Puts Muppets Where Its Mouth Is

Stock Trades: A Better Deal For Investors Isn't Simple

The three big Internet portals begin to distinguish

among themselves as shopping mallsRed Light, Green Light: A

2-Tone L.E.D. to Simplify Screens

TOPIC 2

TOPIC 3

TOPIC 1

Page 51: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative story

51

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Page 52: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative story

52

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

chta4578
Highlight
chta4578
Pencil
Page 53: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative story

53

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

chta4578
Pencil
Page 54: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Generative story

54

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Page 55: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Missing component: how to generate a multinomial distribution

55

Page 56: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Missing component: how to generate a multinomial distribution

56

What are Topic Models?

Dirichlet Distribution

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 12 of 26

Page 57: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Missing component: how to generate a multinomial distribution

57

Page 58: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Conjugacy of Dirichlet and Multinomial

58

What are Topic Models?

Dirichlet Distribution

• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then

p(�|↵,w) / p(w |�)p(�|↵) (1)

/Y

k

�nkY

k

�↵k�1 (2)

/Y

k

�↵k+nk�1 (3)

• Conjugacy: this posterior has the same form as the prior

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 14 of 26

Page 59: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Conjugacy of Dirichlet and Multinomial

59

What are Topic Models?

Dirichlet Distribution

• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then

p(�|↵,w) / p(w |�)p(�|↵) (1)

/Y

k

�nkY

k

�↵k�1 (2)

/Y

k

�↵k+nk�1 (3)

• Conjugacy: this posterior has the same form as the prior

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 14 of 26

Page 60: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Making the generative story formal

60

What are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Page 61: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

61

What are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Making the generative story formal

Page 62: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

62

Making the generative story formalWhat are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Page 63: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

63

Making the generative story formal

What are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Page 64: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Topic models: What’s important

• Topic models (latent variables)• Topics to word types—multinomial distribution• Documents to topics—multinomial distribution

• Modeling & Algorithm• Model: story of how your data came to be• Latent variables: missing pieces of your story• Statistical inference: filling in those missing pieces (next lecture)

• We use latent Dirichlet allocation (LDA), a fully Bayesian version of pLSI, probabilistic version of LSA

64

Page 65: Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Recap

• Expectation-maximization: a general algorithm for mixture models

• Topic models: a neat way to model discrete count data

65