18
On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz III School of Public Policy and Management and Center for Automated Learning and Discover y Carnegie Mellon University {xbai,rpadman}@andrew.cmu.edu Edoardo Airoldi Data Privacy Laboratory School of Computer Science Carnegie Mellon University [email protected]

On Learning Parsimonious Models for Extracting Consumer Opinions

Embed Size (px)

DESCRIPTION

On Learning Parsimonious Models for Extracting Consumer Opinions. Edoardo Airoldi Data Privacy Laboratory School of Computer Science Carnegie Mellon University [email protected]. Xue Bai and Rema Padman The John Heinz III School of Public Policy and Management - PowerPoint PPT Presentation

Citation preview

Page 1: On Learning Parsimonious Models for Extracting Consumer Opinions

On Learning Parsimonious Models for Extracting Consumer Opinions

International Conference on System Sciences 2005

Xue Bai and Rema PadmanThe John Heinz III School of Public Policy and Management

and Center for Automated Learning and DiscoveryCarnegie Mellon University

{xbai,rpadman}@andrew.cmu.edu

Edoardo AiroldiData Privacy Laboratory

School of Computer Science

Carnegie Mellon University

[email protected]

Page 2: On Learning Parsimonious Models for Extracting Consumer Opinions

Introduction

Past text categorization solutions yield cross-validated accuracies and areas under the curve (AUC) in the low 80%s when ported to sentiment extraction.

The reason for the failure: features used in the classification (e.g. the words) are assumed to be pairwise independent.

We propose an algorithm that is able to capture dependencies among words, and is able to find a minimal vocabulary, sufficient for categorization p

urposes. Our two-stage Markov Blanket Classifier (MBC)

1st Stage: Learns conditional dependencies among the words and encodes them into a Markov Blanket Directed Acyclic Graph

2nd Stage: Uses a Tabu Search (TS) to fine tune the MB DAG

Page 3: On Learning Parsimonious Models for Extracting Consumer Opinions

Related Work (1/2)

Hatzivassiloglou and McKeown built a log-linear model to predict the semantic orientation of conjoined adjectives.

Das and Chen Manually construct lexicon and grammar rules They categorized stock news as buy, sell or neutral Use five classifiers and various voting schemes to achieve

an accuracy of 62% Turney and Littman proposed a semi-supervised m

ethod to learn the polarity of adjectives starting from a small set of adjectives of known polarity.

Turney used this method to predict the opinions of consumers about various objects (movies, cars, banks) and achieved accuracies between 66% and 84%.

Page 4: On Learning Parsimonious Models for Extracting Consumer Opinions

Pang et al. used off-the-shelf classification methods on frequent, non-contextual words in combination with various heuristics and annotators, and achieved a maximum cross-validated accuracy of 82.9% on data from IMDb.

Dave et al. categorized positive versus negative movie reviews using SVM, and achieved an accuracy of at most 88.9% on data from Amazon and Cnn.Net.

Liu et al. proposed a framework to categorize emotions based on a large dictionary of common sense knowledge and on linguistic models.

Related Work (2/2)

Page 5: On Learning Parsimonious Models for Extracting Consumer Opinions

Bayesian Networks (1/2)

A Bayesian network for a set of variables X = {X1, ...,Xn} consists of a network structure S that encodes a set

of conditional independence assertions among variables in X;

a set P = {p1, ..., pn} of local conditional probability distributions associated with each node and its parents.

Page 6: On Learning Parsimonious Models for Extracting Consumer Opinions

P satisfies the Markov condition for S if every node Xi in S is independent of its non-descendants, conditional on its parents.

Bayesian Networks (2/2)

pai : the set of parents of Xi

p(Y,X1, ...,X6) = C · p(Y | X1) · p(X4 | X2, Y ) · p(X5 | X3, X4, Y ) · p(X2 | X1) · p(X3 | X1) · p(X6 | X4)

Page 7: On Learning Parsimonious Models for Extracting Consumer Opinions

Markov Blanket (1/2)

Given the set of variables X, target variable Y and a Bayesian network (S, P), the Markov Blanket for Y consists of paY , the set of parents of Y ; chY , the set of children of Y ; and pa chY ,the set of parents of children of Y .

p(Y |X1, ...,X6) = C’ · p(Y |X1) · p(X4|X2, Y ) · p(X5|X3,X4, Y )

Page 8: On Learning Parsimonious Models for Extracting Consumer Opinions

Different MB DAGs that entail the same factorization for p(Y |X1, ...,X6) belong to the same Markov equivalence class.

Our algorithm searches the space of Markov equivalent classes, rather than that of DAGs, thus boosting its efficiency.

Markov Blanket (2/2)

Page 9: On Learning Parsimonious Models for Extracting Consumer Opinions

Tabu Search

Tabu Search (TS) is a powerful meta-heuristic strategy that helps local search heuristics explore the space of solutions by guiding them out of local optima.

The basic Tabu Search starts with a feasible solution and iteratively chooses the best move, according to a specified evaluation function.

Keeping a tabu list of restrictions on possible moves, updated at each step, which discourage the repetition of selected moves.

Page 10: On Learning Parsimonious Models for Extracting Consumer Opinions

MB Classifier 1st Stage: Learning Initial Dependencies (1/2)

1. Search potential parents and children(LY) of Y, and potential parents and children (∪iL

Xi) of nodes Xi L∈ Y

This search is performed by two recursive calls to the function findAdjacencies(Y)

2. Orient the edges using six edge orientation rules

3. Prune the remaining undirected edges and bi-directed edges to avoid cycles

Page 11: On Learning Parsimonious Models for Extracting Consumer Opinions

findAdjacencies (Node Y, Node List L, Depth d, Significance α)1. AY := {Xi L∈ : Xi is dependent of Y at level α}2. for Xi A∈ Y and for all distinct subsets S {A⊂ Y \ Xi}d

2.1. if Xi is independent of Y given S at level α 2.2. then remove Xi from AY

3. for Xi A∈ Y

3.1. AXi := {Xj L∈ : Xj is dependent of Xi at level α,j ≠ i} 3.2. for all distinct subsets S {A⊂ Xi}d

3.2.1. if Xi is independent of Y given S at level α 3.2.2. then remove Xi from AY

4. return AY

MB Classifier 1st Stage: Learning Initial Dependencies (2/2)

Page 12: On Learning Parsimonious Models for Extracting Consumer Opinions

Four kinds of moves are allowed: edge addition, edge deletion, edge reversal, and edge reversal with node pruning.

Always select best move. The tabu list keeps a record of m

previous moves.

MB Classifier 2nd Stage: Tabu Search Optimization

Page 13: On Learning Parsimonious Models for Extracting Consumer Opinions

Experiments 1 (1/3)

Data set: 700 positive and 700 negative movies reviews from the rec.arts.movies.reviews newsgroup archived at the Internet Movie Database (IMDb).

We considered all the words that appeared in more than 8 documents as our input features (7716 words).

5-fold cross-validation

Page 14: On Learning Parsimonious Models for Extracting Consumer Opinions

Experiments 1 (2/3)

Page 15: On Learning Parsimonious Models for Extracting Consumer Opinions

Experiments 1 (3/3)

Page 16: On Learning Parsimonious Models for Extracting Consumer Opinions

Experiment 2 (1/2)

Data set: Mergers and acquisitions (M&A, 1000

documents) Finance (1000 documents) Mixed news (3 topics × 1000 documents)

The sentiments we consider are three: positive, neutral and negative.

3-fold cross-validation

Page 17: On Learning Parsimonious Models for Extracting Consumer Opinions

Experiment 2 (2/2)

Page 18: On Learning Parsimonious Models for Extracting Consumer Opinions

Conclusions

We believe that in order to capture sentiments we have to go beyond the search for richer feature sets and the independence assumption.

We believe that a robust model is obtained by encoding dependencies among words, and by actively searching for a better dependency structure using heuristic and optimal strategies.

The MBC achieves these goals.