Upload
regina-cameron
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
S U P E R V I S O R S D R . V E R E N A R I E S E R & P R O F. R O B P O O L E Y
SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS
P R E S E N T E D B YE S H R A G R E FA E E
OUTLINE
• The concept of sentiment analysis• Arabic as a morphologically rich language• Aims of the research • Sentiment analysis in English and Arabic
literature• Twitter corpus: collection and annotation• Empirical work • Results and evaluation • Future work
SENTIMENT ANALYSIS
• Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text.
• Research on SA appeared early 2000 (Liu, 2012).• SA is one of the most active research areas in
NLP.
APPLICATIONS
• In addition to its significance as a major sub-field of Natural Language Processing (NLP)research, SSA has a potential of several:
Commercial applications measuring success of a product
Social applications
Political applications
Economical applications
SENTIMENT ANALYSIS OF SOCIAL NETWORKS
• The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, and micro-blogs.• A social network like twitter, with more than 500
million active users (ALEXA, 2012), provides a global arena for users to share views, attitudes, preferences etc; and discuss points of agreement, and/or conflict.• March 2012, Twitter has become available in
Arabic (Twitter Blog, 2012)
ABOUT ARABIC
• Arabic is the language of an aggregate population of over 300 million people, first language of the 22 member countries of the Arabic League and official language in three others (Habash, 2010).
ABOUT ARABIC
• Arabic language can be classified into three major levels:
Classic Arabic (CA)Modern standard Arabic (MSA)Arabic Dialects (AD).
• Social networks uses DA & MSA side-by-side(Al-Sabbagh, and Girju, 2012).
AIMS OF THIS RESEARCH
• Construct a corpus of Arabic tweets for sentiment analysis.
• Build and test classification models for automatic sentiment analysis.
• Explore distant supervision approaches to build efficient models for the changing twitter stream.
APPROACH AND METHODOLOGY
Arabic Twitter Corpora
•Build and annotate a Twitter corpora for SSA
Machine Learning Algorithm
•Apply a machine learning scheme:•Support Vector Machines (SVM)•Naïve Bayes (NB)•Decision Tree (J48)
Build a sentiment classifier
•Learn a statistical classifier to discriminate a given text to:• subjective vs. objective• subjective positive vs. subjective negative
Evaluate and test models’
capabilities of being generalised
•10 fold cross-validation
•Independent test set
OUR ARABIC TWITTER CORPUS
(Refaee E, and Rieser V, 2014). An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) Reykjavik, Iceland.
Corpus freely available from LREC repository.
APPROACH AND METHODOLOGY
Arabic Twitter Corpora
•Build and annotate a Twitter corpora for SSA
Machine Learning Algorithm
•Apply a machine learning scheme:•Support Vector Machines (SVM)•Naïve Bayes (NB)•Decision Tree (J48)
Build a sentiment classifier
•Learn a statistical classifier to discriminate a given text to:• subjective vs. objective• subjective positive vs. subjective negative
Evaluate and test models’
capabilities of being generalised
•10 fold cross-validation
•Independent test set
BUILDING TRAINING SET : FEATURES EXTRACTION & FEATURE VECTOR CONSTRUCTION
Raw tweets An Arabic
Twitter Corpora
• Text cleaning-up• Sentiment annotation• Feature extraction• Pre-processing: build
feature vector
Classifier/ learner
Class of a new
document
EXPERIMENTAL SETTINGS
a. Machine learners We use the implementations of the following algorithms provided by the WEKA data mining package – version 3.7.9 (Witten and Frank, 2005). Sequential Minimal Optimization-SMO (Platt,
1999) Support Vector Machines (SVM)
ZeroR (baseline scheme)
SVM aims to identify the Optimal hyperplane that linearly separates data
instances with the maximum margin
RESULTS AND EVALUATION
Tokens Morph feat. Semantic feat.
Stylistic feat.0
102030405060708090
100
baselineSVM
baseline SVM
Tokens 55.25 94.55
Morph feat. 55.25 95.64
Semantic feat. 55.25 96.02
Stylistic feat. 55.25 96.05
2-level classification: Subjective vs. Objective
RESULTS AND EVALUATION
Tokens Morph feat. Semantic feat.
Stylistic feat.
0102030405060708090
100
baselineSVM
2-level classification: positive vs. negative
baseline SVMTokens 50.16 88.21
Morph feat. 50.16 89.55
Semantic feat. 50.16 91.69
Stylistic feat. 50.16 92.1
CURRENT DIRECTION OF RESEARCH
• Applying semi-supervised learning to automatically annotate the rest of our twitter corpus.
• Investigate distant learning approaches to boost a large training set to be used for models’ optimisation.
• Building a high quality polarity lexicon to be employed in automatically detecting/identifying the overall sentiment orientation of a given text.
• Explore culture-related features that can detect cultural references in user-generated text.
THANKS
@eshragR