Upload
alfred-price
View
213
Download
1
Embed Size (px)
Citation preview
Sentiment Analysis
An Overview of Concepts and Selected Techniques
Terms Sentiment
A thought, view, or attitude, especially one based mainly on emotion instead of reason
Sentiment Analysis aka opinion mining use of natural language processing (NLP) and
computational techniques to automate the extraction or classification of sentiment from typically unstructured text
Motivation Consumer information
Product reviews Marketing
Consumer attitudes Trends
Politics Politicians want to know voters’ views Voters want to know policitians’ stances and who else
supports them Social
Find like-minded individuals or communities
Problem Which features to use?
Words (unigrams) Phrases/n-grams Sentences
How to interpret features for sentiment detection? Bag of words (IR) Annotated lexicons (WordNet, SentiWordNet) Syntactic patterns Paragraph structure
Challenges Harder than topical classification, with
which bag of words features perform well Must consider other features due to…
Subtlety of sentiment expression irony expression of sentiment using neutral words
Domain/context dependence words/phrases can mean different things in different
contexts and domains Effect of syntax on semantics
Approaches Machine learning
Naïve Bayes Maximum Entropy Classifier SVM Markov Blanket Classifier
Accounts for conditional feature dependencies Allowed reduction of discriminating features from
thousands of words to about 20 (movie review domain)
Unsupervised methods Use lexicons
Assume pairwise independent features
LingPipe Polarity Classifier First eliminate objective sentences, then
use remaining sentences to classify document polarity (reduce noise)
LingPipe Polarity Classifier Uses unigram features extracted from
movie review data Assumes that adjacent sentences are
likely to have similar subjective-objective (SO) polarity
Uses a min-cut algorithm to efficiently extract subjective sentences
LingPipe Polarity ClassifierGraph for classifying three items.
LingPipe Polarity Classifier Accurate as baseline but uses only 22% of
content in test data (average) Metrics suggests properties of movie
review structure
SentiWordNet Based on WordNet “synsets”
http://wordnet.princeton.edu/ Ternary classifier
Positive, negative, and neutral scores for each synset
Provides means of gauging sentiment for a text
SentiWordNet: Construction Created training sets of synsets, Lp and Ln
Start with small number of synsets with fundamentally positive or negative semantics, e.g., “nice” and “nasty”
Use WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand Lp and Ln over K iterations
Lo (objective) is set of synsets not in Lp or Ln
Trained classifiers on training set Rocchio and SVM Use four values of K to create eight classifiers with
different precision/recall characteristics As K increases, P decreases and R increases
SentiWordNet: Results 24.6% synsets with Objective<1.0
Many terms are classified with some degree of subjectivity
10.45% with Objective<=0.5 0.56% with Objective<=0.125
Only a few terms are classified as definitively subjective
Difficult (if not impossible) to accurately assess performance
SentiWordNet: How to use it Use score to select features (+/-)
e.g. Zhang and Zhang (2006) used words in corpus with subjectivity score of 0.5 or greater
Combine pos/neg/objective scores to calculate document-level score e.g. Devitt and Ahmad (2007) conflated
polarity scores with a Wordnet-based graph representation of documents to create predictive metrics
References1. http://www.answers.com/sentiment, 9/22/08 B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment
classification using machine learning techniques,” in Proc Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002.
Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf on Language Resources and Evaluation, 2006.
Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining. TREC 2006 Blog Track, Opinion Retrieval Task.
Devitt A, Ahmad K. Sentiment Polarity Identification in Financial News: A Cohesion-based Approach. ACL 2007.
Bo Pang , Lillian Lee, A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.271-es, July 21-26, 2004.