28
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William Cohen and Eric Xing Machine Learning Department Carnegie Mellon University

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

Sparse Word Graphs:A Scalable Algorithm for Capturing Word Correlations in Topic Models

Ramesh NallapatiJoint work with

John Lafferty, Amr Ahmed,

William Cohen and Eric XingMachine Learning Department

Carnegie Mellon University

Page 2: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 2/28

Introduction

• Statistical topic modeling: an attractive framework for topic discovery– Completely unsupervised– Models text very well

• Lower perplexity compared to unigram models

– Reveals meaningful semantic patterns– Can help summarize and visualize document

collections– e.g.: PLSA, LDA, DPM, DTM, CTM, PA

Page 3: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 3/28

Introduction

• A common assumption in all the variants:– Exchangeability: “bag of words” assumption– Topics represented as a ranked list of words

• Consequences:– Word Correlation information is lost

• e.g.: “white-house” vs. “white” and “house”• Long distance correlations

Page 4: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 4/28

Introduction

• Objective:– To capture correlations between words within

topics

• Motivation:– More interpretable representation of topics as

a network of words rather than a list– Helps better visualize and summarize

document collections– May reveal unexpected relationships and

patterns within topics

Page 5: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 5/28

Past Work: Topic Models

• Bigram topic models [Wallach, ICML 2006]

• Requires KV(K-1) parameters

• Only captures local dependencies

• Does not model sparsity of correlations

• Does not capture “within-topic” correlations

Page 6: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 6/28

Past work: Other approaches

• Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96]

– Word pair correlation measured as a weighted count of number of times they occur within a fixed length window

– Weight of an occurrence / 1/(mutual distance)

Page 7: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 7/28

Past work: Other approaches

• Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96]

– Plusses: • Sparse solutions, scalability

– Minuses: • Only unearths global correlations, not semantic correlations

– E.g.: “river – bank”, “bank – check” • Only local dependencies

Page 8: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 8/28

Past work: Other approaches

• Query expansion in IR– Similar in spirit: finds words that highly co-

occur with the query words– However, not a corpus visualization tool:

requires a context to operate on

• Wordnet– Semantic networks– Human labeled: not directly related to our goal

Page 9: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 9/28

Our approach

• L1 norm regularization

– Known to enforce sparse solutions• Sparsity permits scalability

– Convex optimization problem • Globally optimal solutions

– Recent advances in learning structure of graphical models:

• L1 regularization framework asymptotically leads to true structure

Page 10: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 10/28

Background:LASSO

• Example: linear regression

• Regularization used to improve generalizability– E.g.1: Ridge regression: L2 norm regularization

– E.g.2: Lasso: L1 norm regularization

Page 11: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 11/28

Background: LASSO

• Lasso encourages sparse solutions

Page 12: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 12/28

Background: Gaussian Random Fields

• Multivariate Gaussian distribution

• Random field structure: G = (V,E)– V: set of all variables {X1,,Xp}

– (s,t) 2 E , -1st 0

– Xs ? Xu | XN(s) where u N(s)

Page 13: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 13/28

Background: Gaussian Random Fields

• Estimating the graph structure of GRF from data [Meinshausen and Buhlmann, Annals. Stats., 2006]

– Regress each variable onto others imposing L1 penalty to encourage sparsity

– Estimated neighborhood:

Page 14: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 14/28

Background: Gaussian Random Fields

True Graph Estimated graph

Courtesy: [Meinshausen and Buhlmann, Annals. Stats., 2006]

Page 15: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 15/28

Background: Gaussian Random Fields

• Application to topic models: CTM [Blei and Lafferty, NIPS, 2006]

Page 16: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 16/28

Background: Gaussian Random Fields

• Application to CTM:[Blei & Lafferty, Annals. Appl. Stats., ‘07]

Page 17: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 17/28

Structure learning of an MRF

• Ising model

• L1 regularized conditional likelihood learns true structure asymptotically

[Wainwright, Ravikumar and Lafferty, NIPS’06]

Page 18: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 18/28

Structure learning of an MRFCourtesy: [Wainwright, Ravikumar and Lafferty, NIPS’06]

Page 19: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 19/28

Sparse Word Graphs• Algorithm

– Run LDA on the document collection and obtain topic assignments

– Convert topic assignments for each document into K binary vectors X:

– Assume an MRF for each topic with X as underlying data

– Apply structure learning for MRF using regularized conditional likelihood

Page 20: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 20/28

Sparse Word Graphs

Page 21: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 21/28

Sparse Word Graphs: Scalability

• We still run V logistic regression problems, each of size V for each topic: O(KV2) !– However, each example is very sparse

– L1 penalty results in sparse solutions

– Can run each topic in parallel

– Efficient interior point based L1 regularized logistic regression [Koh, Kim & Boyd, JMLR,’07]

Page 22: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 22/28

Experiments

• Small AP corpus– 2.2K Docs, 10.5K unique words

• Ran 10 topic LDA model

• Used = 0.1 in L1 logistic regression

• Took just 45 min. per topic

• Very sparse solutions– Computes only under 0.1% of the total

number of possible edges

Page 23: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 23/28

Topic “Business”: neighborhood of top LDA terms

Page 24: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 24/28

Topic “Business”: neighborhood of top edges

Page 25: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 25/28

Topic “War”: neighborhood of top LDA terms

Page 26: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 26/28

Topic “War”: neighborhood of top edges

Page 27: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 27/28

Concluding remarks

• Pros– A highly scalable algorithm for capturing

within topic word correlations– Captures both short distance and long

distance correlations– Makes topics more interpretable

• Cons– Not a complete probabilistic model

• Significant modeling challenge since the correlations are latent

Page 28: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William

8/28/2007/9:30am ICDM’07 HPDM workskop 28/28

Concluding remarks

• Applications of Sparse Word Graphs– Better document summarization and

visualization tool– Word sense disambiguation– Semantic query expansion

• Future Work– Evaluation on a “real task”– Build a unified statistical model