SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE

S U P E R V I S O R S D R . V E R E N A R I E S E R & P R O F. R O B P O O L E Y

SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS

P R E S E N T E D B YE S H R A G R E FA E E

OUTLINE

• The concept of sentiment analysis• Arabic as a morphologically rich language• Aims of the research • Sentiment analysis in English and Arabic

literature• Twitter corpus: collection and annotation• Empirical work • Results and evaluation • Future work

SENTIMENT ANALYSIS

• Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text.

• Research on SA appeared early 2000 (Liu, 2012).• SA is one of the most active research areas in

NLP.

APPLICATIONS

• In addition to its significance as a major sub-field of Natural Language Processing (NLP)research, SSA has a potential of several:

Commercial applications measuring success of a product

Social applications

Political applications

Economical applications

SENTIMENT ANALYSIS OF SOCIAL NETWORKS

• The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, and micro-blogs.• A social network like twitter, with more than 500

million active users (ALEXA, 2012), provides a global arena for users to share views, attitudes, preferences etc; and discuss points of agreement, and/or conflict.• March 2012, Twitter has become available in

Arabic (Twitter Blog, 2012)

ABOUT ARABIC

• Arabic is the language of an aggregate population of over 300 million people, first language of the 22 member countries of the Arabic League and official language in three others (Habash, 2010).

ABOUT ARABIC

• Arabic language can be classified into three major levels:

Classic Arabic (CA)Modern standard Arabic (MSA)Arabic Dialects (AD).

• Social networks uses DA & MSA side-by-side(Al-Sabbagh, and Girju, 2012).

AIMS OF THIS RESEARCH

• Construct a corpus of Arabic tweets for sentiment analysis.

• Build and test classification models for automatic sentiment analysis.

• Explore distant supervision approaches to build efficient models for the changing twitter stream.

APPROACH AND METHODOLOGY

Arabic Twitter Corpora

•Build and annotate a Twitter corpora for SSA

Machine Learning Algorithm

•Apply a machine learning scheme:•Support Vector Machines (SVM)•Naïve Bayes (NB)•Decision Tree (J48)

Build a sentiment classifier

•Learn a statistical classifier to discriminate a given text to:• subjective vs. objective• subjective positive vs. subjective negative

Evaluate and test models’

capabilities of being generalised

•10 fold cross-validation

•Independent test set

OUR ARABIC TWITTER CORPUS

(Refaee E, and Rieser V, 2014). An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) Reykjavik, Iceland.

Corpus freely available from LREC repository.

APPROACH AND METHODOLOGY

Arabic Twitter Corpora

•Build and annotate a Twitter corpora for SSA

Machine Learning Algorithm

•Apply a machine learning scheme:•Support Vector Machines (SVM)•Naïve Bayes (NB)•Decision Tree (J48)

Build a sentiment classifier

•Learn a statistical classifier to discriminate a given text to:• subjective vs. objective• subjective positive vs. subjective negative

Evaluate and test models’

capabilities of being generalised

•10 fold cross-validation

•Independent test set

BUILDING TRAINING SET : FEATURES EXTRACTION & FEATURE VECTOR CONSTRUCTION

Raw tweets An Arabic

Twitter Corpora

• Text cleaning-up• Sentiment annotation• Feature extraction• Pre-processing: build

feature vector

Classifier/ learner

Class of a new

document

EXPERIMENTAL SETTINGS

a. Machine learners We use the implementations of the following algorithms provided by the WEKA data mining package – version 3.7.9 (Witten and Frank, 2005). Sequential Minimal Optimization-SMO (Platt,

1999) Support Vector Machines (SVM)

ZeroR (baseline scheme)

SVM aims to identify the Optimal hyperplane that linearly separates data

instances with the maximum margin

RESULTS AND EVALUATION

Tokens Morph feat. Semantic feat.

Stylistic feat.0

102030405060708090

100

baselineSVM

baseline SVM

Tokens 55.25 94.55

Morph feat. 55.25 95.64

Semantic feat. 55.25 96.02

Stylistic feat. 55.25 96.05

2-level classification: Subjective vs. Objective

RESULTS AND EVALUATION

Tokens Morph feat. Semantic feat.

Stylistic feat.

0102030405060708090

100

baselineSVM

2-level classification: positive vs. negative

baseline SVMTokens 50.16 88.21

Morph feat. 50.16 89.55

Semantic feat. 50.16 91.69

Stylistic feat. 50.16 92.1

CURRENT DIRECTION OF RESEARCH

• Applying semi-supervised learning to automatically annotate the rest of our twitter corpus.

• Investigate distant learning approaches to boost a large training set to be used for models’ optimisation.

• Building a high quality polarity lexicon to be employed in automatically detecting/identifying the overall sentiment orientation of a given text.

• Explore culture-related features that can detect cultural references in user-generated text.

THANKS

@eshragR

Documents

SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE