28
Example • 16,000 documents • 100 topic • Picked those with large p(w|z)

Example 16,000 documents 100 topic Picked those with large p(w|z)

Embed Size (px)

Citation preview

Page 1: Example 16,000 documents 100 topic Picked those with large p(w|z)

Example

• 16,000 documents

• 100 topic

• Picked those with large p(w|z)

Page 2: Example 16,000 documents 100 topic Picked those with large p(w|z)
Page 3: Example 16,000 documents 100 topic Picked those with large p(w|z)

• Given a new document, compute and • words allocated to each topic

• approximates p(zn|w)

• See cases where these values are relatively large

• 4 topics found

n

New document?i n

ii

Page 4: Example 16,000 documents 100 topic Picked those with large p(w|z)

Unseen document (contd.)

• Bag of words - William Randolph Hearst Foundation assigned to different topics

Page 5: Example 16,000 documents 100 topic Picked those with large p(w|z)

Applications and empirical results

• Document modeling

• Document classification

• Collaborative filtering

Page 6: Example 16,000 documents 100 topic Picked those with large p(w|z)

Document modeling

• Task: density estimation, high likelihood to unseen document

• Measure of goodness: perplexity

• Monotonically decreases in the likelihood

Page 7: Example 16,000 documents 100 topic Picked those with large p(w|z)

The experiment

Articles Terms

Scientific abstracts

5,225 28,414

Newswire articles

16,333 23,075

Page 8: Example 16,000 documents 100 topic Picked those with large p(w|z)

The experiment (contd.)

• Preprocessed– stop words– appearing once

• 10% held for training

• Trained with the same stopping criteria

Page 9: Example 16,000 documents 100 topic Picked those with large p(w|z)

Results

Page 10: Example 16,000 documents 100 topic Picked those with large p(w|z)

Overfitting in Mixture of unigrams

• Peaked posterior in the training set

• Unseen document with unseen word

• Word will have very small probability

• Remedy: smoothing

Page 11: Example 16,000 documents 100 topic Picked those with large p(w|z)

Overfitting in pLSI

• Mixture of topics allowed

• Marginalize over d to find p(w)

• Restriction to having the same topic proportions as training documents

• “Folding in” ignore p(z|d) parameters and refit p(z|dnew)

Page 12: Example 16,000 documents 100 topic Picked those with large p(w|z)

LDA

• Documents can have different proportions of topics

• No heuristics

Page 13: Example 16,000 documents 100 topic Picked those with large p(w|z)
Page 14: Example 16,000 documents 100 topic Picked those with large p(w|z)

Document classification

• Generative or discriminative

• Choice of features in document classification

• LDA as dimensionality reduction technique

• as LDA features)w(

Page 15: Example 16,000 documents 100 topic Picked those with large p(w|z)

The experiment

• Binary classification

• 8000 documents, 15,818 words

• True label not known

• 50 topic

• Trained SVM on the LDA features

• Compared with SVM on all word features

• LDA reduced feature space by 99.6%

Page 16: Example 16,000 documents 100 topic Picked those with large p(w|z)

GRAIN vs NOT GRAIN

Page 17: Example 16,000 documents 100 topic Picked those with large p(w|z)

EARN vs NOT EARN

Page 18: Example 16,000 documents 100 topic Picked those with large p(w|z)

LDA in document classification

• Feature space reduced, performance improved

• Results need further investigation

• Use for feature selection

Page 19: Example 16,000 documents 100 topic Picked those with large p(w|z)

Collaborative filtering

• Collection of users and movies they prefer

• Trained on observed users

• Task: given unobserved user and all movies preferred but one, predict the held out movie

• Only users who positively rated 100 movies

• Trained on 89% of data

Page 20: Example 16,000 documents 100 topic Picked those with large p(w|z)

Some quantities required…

• Probability of held out movie p(w|wobs)

– For mixture of unigrams and pLSI sum out topic variable

– For LDA sum out topic and Dirichlet variables (quantity efficient to compute)

Page 21: Example 16,000 documents 100 topic Picked those with large p(w|z)

Results

Page 22: Example 16,000 documents 100 topic Picked those with large p(w|z)

Further work

• Other approaches for inference and parameter estimation

• Embedded in another model

• Other types of data

• Partial exchangeability

Page 23: Example 16,000 documents 100 topic Picked those with large p(w|z)

Example – Visual words

• Document = image

• Words = image features: bars, circles

• Topics = face, airplane

• Bag of words = no spatial relationship between objects

Page 24: Example 16,000 documents 100 topic Picked those with large p(w|z)

Visual words

Page 25: Example 16,000 documents 100 topic Picked those with large p(w|z)

Identifying the visual words and topics

Page 26: Example 16,000 documents 100 topic Picked those with large p(w|z)

Conclusion

• Exchangeability, De Finetti Theorem

• Dirichlet distribution Generative Bag of words Independence assumption in Dirichlet

distribution - correlated topics

Page 27: Example 16,000 documents 100 topic Picked those with large p(w|z)

Implementations

• In C (by one of the authors)– http://www.cs.princeton.edu/~blei/lda-c/

• In C and Matlab– http://chasen.org/~daiti-m/dist/lda/

Page 28: Example 16,000 documents 100 topic Picked those with large p(w|z)

References

• Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan. In Journal of Machine Learning Research, 3:993-1022, 2003

• Discovering object categories in image collections. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. MIT AI Lab Memo AIM-2005-005, February, 2005

• Correlated topic models, David Blei and John Lafferty, Advances in Neural Information Processing Systems 18, 2005.