25
Sentiment Analysis of Arabic: A Survey Sara Mohammed AL-Kharji AND Anfal Abdullah AL-Tuwaim Supervised by: Dr. Amal Alsaif Mohammed Ibn Saud Islamic University ge of Computer and Information Sciences al Languages Processing (CS465) ter 2, 2013

Sentiment analysis of arabic,a survey

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Sentiment analysis of arabic,a survey

Sentiment Analysis of Arabic: A Survey

Sara Mohammed AL-Kharji AND

Anfal Abdullah AL-TuwaimSupervised by:Dr. Amal Alsaif

Imam Mohammed Ibn Saud Islamic UniversityCollege of Computer and Information SciencesNatural Languages Processing (CS465)Semester 2, 2013

Page 2: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 3: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 4: Sentiment analysis of arabic,a survey

• Sentiment analysis is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language.

• Most of the systems built for sentiment analysis are tailored for the English language, but there are very few resources for other languages.

INTRODUCTION

Page 5: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 6: Sentiment analysis of arabic,a survey

ARABIC

• Official language of 22 countries, Arabic is spoken by more than 300 million people

• The fastest-growing language on the web • Arabic is a Semitic language and consists of many

different regional dialects• Modern Standard Arabic (MSA)• Arabic sentential forms are divided into two types,

nominal and verbal constructions . In the verbal domain, Arabic has two word order patterns (i.e., Subject-Verb- Object and Verb-Subject-Object).

Page 7: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 8: Sentiment analysis of arabic,a survey

SENTIMENT ANALYSIS SYSTEMS AND METHODS FOR ARABIC:

• Subjectivity process:– Tokenization.– Stemming.– Stop Words elimination.

• Sentiment process:(1) Objective (OBJ).(2) Subjective-Positive (S-POS).(3) Subjective-Negative (S-NEG).(4) Subjective-Neutral (S-NEUT).

Page 9: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 10: Sentiment analysis of arabic,a survey

1. SAA CATEGORIES:

Page 11: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 12: Sentiment analysis of arabic,a survey

2. AUTOMATIC CLASSIFICATION:

• Run experiments on gold-tokenized text from PATB.

• Experiment with three different pre-processing lemmatization configurations that specifically target the stem words: (1) Surface; (2) Lemma; and (3) Stem.

• It adopts a two-stage classification approach:– (Subjectivity)– (Sentiment)

Page 13: Sentiment analysis of arabic,a survey

2. AUTOMATIC CLASSIFICATION: (CONT)

• Use TreeBank (PATB), And dividing data into 80% for 5-fold cross validation and 20% for test.

• Subjectivity results on Stem+Morph+language independent features

• Sentiment results on Stem+Morph+language independent features

Page 14: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 15: Sentiment analysis of arabic,a survey

3. AUTOMATICALLY EXTRACTING SENTIMENTS FROM FINANCIAL TEXTS:

(CONT)

• Importance of sentiments analysis for financial market.• The sentiment words were selected comprised

movement words, rise/fall, and metaphorical words like growth/decline.• Local grammar

Page 16: Sentiment analysis of arabic,a survey

RESULT:

3. AUTOMATICALLY EXTRACTING SENTIMENTS FROM FINANCIAL TEXTS:

(CONT)

movement words & metaphorical words from Middle East and NorthAfrica Financial Network (MENA-FN) corpus

Page 17: Sentiment analysis of arabic,a survey

RESULT:

3. AUTOMATICALLY EXTRACTING SENTIMENTS FROM FINANCIAL TEXTS:

(CONT)

Local grammar in Arabic text

Page 18: Sentiment analysis of arabic,a survey

3. AUTOMATICALLY EXTRACTING SENTIMENTS FROM FINANCIAL TEXTS:

(CONT)

Prototypes of Ara-SATISFI “Arabic Sentiment and Time Series: Financial Analysis System”

Page 19: Sentiment analysis of arabic,a survey

OUTLINE:

• Introduction.•Arabic.• Sentiment Analysis Systems and Methods for

Arabic:• SAA categories.• Automatic Classification.• Automatically extracting sentiments from financial texts.• Unbalanced Sentiment Classification in an Arabic context

Page 20: Sentiment analysis of arabic,a survey

4. UNBALANCED SENTIMENT CLASSIFICATION IN AN ARABIC CONTEXT

(CONT)

• For most studies in SA, can note that the problem of unbalanced data sets (UD) is not tackled. • There are generally two approaches for UD.

- The first approach tends to modify the classifier-The second approach deals with the modification of the data set itself

• Two common methods, the modification of the data set.- The first focuses on under sampling.- The second deals with over-sampling .

Page 21: Sentiment analysis of arabic,a survey

under sampling method:Propose FOUR different techniques• Remove Similar (RS)• Remove Farthest (RF)• Remove by Clustering (RC).• Random Removable (RR).

4. UNBALANCED SENTIMENT CLASSIFICATION IN AN ARABIC CONTEXT

(CONT)

Page 22: Sentiment analysis of arabic,a survey

EXPERIMENTS1) Preprocessing2) Classification and algorithmsThe categories to consider are POSITIVE, NEGATIVE, OBJECTIVE and NOT_ARABIC. POSITIVE

3)Validation method: randomly split into two sets: a training set representing 75% of the data set, and a test set representing 25% of the data set.

4. UNBALANCED SENTIMENT CLASSIFICATION IN AN ARABIC CONTEXT

(CONT)

Page 23: Sentiment analysis of arabic,a survey

4) Performance measure:

CONFUSION MATRIX

• g-performance:

4. UNBALANCED SENTIMENT CLASSIFICATION IN AN ARABIC CONTEXT

(CONT)

Page 24: Sentiment analysis of arabic,a survey

• Have used two standard classifiers: Naïve Bayes (NB) AND Support Vector Machines (SVM).

4. UNBALANCED SENTIMENT CLASSIFICATION IN AN ARABIC CONTEXT

(CONT)RESULT:

Page 25: Sentiment analysis of arabic,a survey

THANK YOU