15
Sentiment Analysis An Overview of Concepts and Selected Techniques

Sentiment Analysis An Overview of Concepts and Selected Techniques

Embed Size (px)

Citation preview

Page 1: Sentiment Analysis An Overview of Concepts and Selected Techniques

Sentiment Analysis

An Overview of Concepts and Selected Techniques

Page 2: Sentiment Analysis An Overview of Concepts and Selected Techniques

Terms Sentiment

A thought, view, or attitude, especially one based mainly on emotion instead of reason

Sentiment Analysis aka opinion mining use of natural language processing (NLP) and

computational techniques to automate the extraction or classification of sentiment from typically unstructured text

Page 3: Sentiment Analysis An Overview of Concepts and Selected Techniques

Motivation Consumer information

Product reviews Marketing

Consumer attitudes Trends

Politics Politicians want to know voters’ views Voters want to know policitians’ stances and who else

supports them Social

Find like-minded individuals or communities

Page 4: Sentiment Analysis An Overview of Concepts and Selected Techniques

Problem Which features to use?

Words (unigrams) Phrases/n-grams Sentences

How to interpret features for sentiment detection? Bag of words (IR) Annotated lexicons (WordNet, SentiWordNet) Syntactic patterns Paragraph structure

Page 5: Sentiment Analysis An Overview of Concepts and Selected Techniques

Challenges Harder than topical classification, with

which bag of words features perform well Must consider other features due to…

Subtlety of sentiment expression irony expression of sentiment using neutral words

Domain/context dependence words/phrases can mean different things in different

contexts and domains Effect of syntax on semantics

Page 6: Sentiment Analysis An Overview of Concepts and Selected Techniques

Approaches Machine learning

Naïve Bayes Maximum Entropy Classifier SVM Markov Blanket Classifier

Accounts for conditional feature dependencies Allowed reduction of discriminating features from

thousands of words to about 20 (movie review domain)

Unsupervised methods Use lexicons

Assume pairwise independent features

Page 7: Sentiment Analysis An Overview of Concepts and Selected Techniques

LingPipe Polarity Classifier First eliminate objective sentences, then

use remaining sentences to classify document polarity (reduce noise)

Page 8: Sentiment Analysis An Overview of Concepts and Selected Techniques

LingPipe Polarity Classifier Uses unigram features extracted from

movie review data Assumes that adjacent sentences are

likely to have similar subjective-objective (SO) polarity

Uses a min-cut algorithm to efficiently extract subjective sentences

Page 9: Sentiment Analysis An Overview of Concepts and Selected Techniques

LingPipe Polarity ClassifierGraph for classifying three items.

Page 10: Sentiment Analysis An Overview of Concepts and Selected Techniques

LingPipe Polarity Classifier Accurate as baseline but uses only 22% of

content in test data (average) Metrics suggests properties of movie

review structure

Page 11: Sentiment Analysis An Overview of Concepts and Selected Techniques

SentiWordNet Based on WordNet “synsets”

http://wordnet.princeton.edu/ Ternary classifier

Positive, negative, and neutral scores for each synset

Provides means of gauging sentiment for a text

Page 12: Sentiment Analysis An Overview of Concepts and Selected Techniques

SentiWordNet: Construction Created training sets of synsets, Lp and Ln

Start with small number of synsets with fundamentally positive or negative semantics, e.g., “nice” and “nasty”

Use WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand Lp and Ln over K iterations

Lo (objective) is set of synsets not in Lp or Ln

Trained classifiers on training set Rocchio and SVM Use four values of K to create eight classifiers with

different precision/recall characteristics As K increases, P decreases and R increases

Page 13: Sentiment Analysis An Overview of Concepts and Selected Techniques

SentiWordNet: Results 24.6% synsets with Objective<1.0

Many terms are classified with some degree of subjectivity

10.45% with Objective<=0.5 0.56% with Objective<=0.125

Only a few terms are classified as definitively subjective

Difficult (if not impossible) to accurately assess performance

Page 14: Sentiment Analysis An Overview of Concepts and Selected Techniques

SentiWordNet: How to use it Use score to select features (+/-)

e.g. Zhang and Zhang (2006) used words in corpus with subjectivity score of 0.5 or greater

Combine pos/neg/objective scores to calculate document-level score e.g. Devitt and Ahmad (2007) conflated

polarity scores with a Wordnet-based graph representation of documents to create predictive metrics

Page 15: Sentiment Analysis An Overview of Concepts and Selected Techniques

References1. http://www.answers.com/sentiment, 9/22/08 B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment

classification using machine learning techniques,” in Proc Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002.

Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf on Language Resources and Evaluation, 2006.

Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining. TREC 2006 Blog Track, Opinion Retrieval Task.

Devitt A, Ahmad K. Sentiment Polarity Identification in Financial News: A Cohesion-based Approach. ACL 2007.

Bo Pang , Lillian Lee, A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.271-es, July 21-26, 2004.