Upload
egozca
View
8
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Sentiment Analysis in Spanish
Citation preview
Sentiment Groups as Features of a Classification Model using a Spanish Sentiment Lexicon: a hybrid approachUniversidad de las Américas PueblaErnesto Gutierrez Corona
Agenda• Motivation• Introduction• Problem statement• Related work• Classification Model• Results• Implementation• Discussion• Future work
Motivation• Prominent use of social networks• Incredible useful subjective information• Decision making process• Affective computing• Trending opinion mining• Marketing and political campaigns
Motivation• Opinions are key influencers of our behavior• Our beliefs and perceptions of reality are conditioned on how others see the world• For decision making we seek other’s opinions
IntroductionSentiment Analysis also known as Opinion Mining involves computational techniques to detect, extract and evaluate sentiments, emotions, and subjectivity expressed in a text. [Liu 2010] :
IntroductionWe can define an opinion according to Liu [2010] as:e: target entity a: aspect/feature of entityso: sentiment orientation (valence)h: opinion holdert: time when opinion is expressed
Tweet example#ArturoVidal muy hombrecito para manejar borracho y a la hora de pedir disculpa un llorón de Mier....e: ArturoVidala: borrachoso: negativeh: @mascocot: june 17, 12:46hs
Problem statementSentiment analysis in Spanish language needs to be addressed in order to take advantage of rapid growth of subjective information found in social networks.
Related Work (techniques)Lexicon-Based Machine Learning Hybrid Approach
Anta et al., 2013 Ngrams with Bayesian classifiers and decision trees:
Del-Hoyo et al., 2009 Feature Vector = TFIDF + sentiment score
Martinez-Camara et al., 2011 TFIDF and BTO with SVM and NB classifiers
Moreno-Ortiz et al., 2013 Heuristic calculator
Sidorov et al., Naïve Bayes, Decision Tree, and Support Vector Machines
Taboada et al., 2011 Syntactic-tree based calculator
Vilares et al., 2013 Syntactic dependence and PoS tags to construct feature vector
TFIDF : term frequency inverse document frequencyBTO: bit term occurrencePoS: part of speechSVM: Support Vector MachineNB: Naive Bayes
Related Work (lexicon)Manual tagging Automatic tagging Translation
Molina-Gonzalez et al., 2013 Domain dependant Machine Translation (EN->ES)
Perez-Rosas et al., 2012 Latent Semantic Analysis
Redondo et al., 2007 Manual translation
Sidorov et al., Six basic human emotions
Naïve Bayes, Decision Tree, and Support Vector Machines
Classification Model
Supervised learning approach:1. Creation of Spanish Corpus2. Creation of Spanish Lexicon3. Feature Selection4. SVM linear models5. Classification
Classification Model
1. Spanish CorpusCorpus is conformed with tweets and reviews where each comment was:• Manually tagged as P+,P,NEU,N N+• Automatically tagged via heuristic calculator as P+,P,NEU,N N+• Sentences that matched both manual and automatic tagging were selectedP+: Very positiveP: PositiveNEU: NeutralN: NegativeN+: Very negative
1. Spanish LexiconLexicon was obtained through:• Extraction of most frequent words from corpus• Adding most common polarity words from online dictionaries • Manual validation of polarity of wordsImportant: words were tagged according to authors intention not to reader interpretation
2. Spanish LexiconCurrently lexicon is conformed by 4583 words categorized as:
3. Feature SelectionSentiment groups are groups of words sintactically related through sentiment orientation:• At most 2 words from distance• Double or triple negation contained in a single group• Sentiment group splitters are punctuation and conjunctions
3. Feature Selection
3. Feature Vector
4. SVM Linear ModelsThree models were obtained from the training phase. It is possible to classify into two classes (P, N) or into four classes (P+, P, N, N+) by simply cascading SVM models.
ResultsOur model was validated using 5, 6, 8 and 10-fold cross validation over balanced corpora (see section 3.5) and also tested against the TASS 2014 corpus and the SFU Reviews Corpus.
Implementation Our model was also tested in twitter for real time analysis of sentiment during WorldCup 2014
ImplementationA front-end layer was added to make more intuitive the results of opinion trends.
Discussion • Words in lexicon are tagged by intention rather tan interpretation.
• Politician, acne are examples of words that have negative interpretation but when in a comment usually author has no intentionality about them:“Politician from PRI signed new reform”• Objective facts can be positive or negative ?“New hospital was built in town”“New Energetic Reform was signed by all parties”• Not universal consensus about polarity
Future Work• Castillo et al., 2014 are implementing a graph-based model and it is possible to integrate our corpus to their work• It is also necessary to add more features to the model to make it more robust• Enhancement of corpus and lexicon
References
Molina-González,M.D.,Martínez-Cámara,E.,Martín-Valdivia,M.-T.,Perea-Ortega,J.M., 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 7250–7257.Perez-Rosas, V., Banea, C., Mihalcea, R., 2012. Learning Sentiment Lexicons in Spanish. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey.
Taboada,M.,Brooke,J.,Tofiloski,M.,Voll,K.,Stede,M.,2011. Lexicon-basedMethods for Sentiment Analysis. Comput. Linguist. 37, 267–307.Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Díaz-Rangel, I., Suárez-Guerra, S., Treviño, A., Gordon, J., 2013. Empirical Study of Machine Learning Based Approach for Opinion Mining in Tweets. In: Batyrshin, I., Mendoza, M.G. (Eds.), Advances in Artificial Intelligence, Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 1–14.
del-Hoyo,R.,Hupont,I.,Lacueva,F.J.,Abadía,D.,2009. Hybrid Text Affect Sensing System for Emotional Language Analysis. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, AFFINE ’09. ACM, New York, NY, USA, pp. 3:1–3:4Vilares,D.,Alonso,M.Á.,Gómez-Rodríguez,C.,2013b. Supervised Polarity Classification of Spanish Tweets Based on Linguistic Knowledge. In: Proceedings of the 2013 ACM Symposium on Document Engineering, DocEng ’13. ACM, New York, NY, USA, pp. 169–172.