Upload
asagroup
View
562
Download
1
Embed Size (px)
Citation preview
1
Subjectivity and Sentiment Analysis of Arabic: Trends
and ChallengesNora Al-Twairesh
2----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Contents
• Introduction• What is Subjectivity and Sentiment Analysis?• Why is it Important?• Sentiment Analysis Applications• Subjectivity and Sentiment Analysis of Arabic• The Literature• Challenges• Conclusion and Future Research Directions
3----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Introduction
What do other people think?!
Which Smart phone?Which laptop?Which hotel?Which policy?Which place?
4----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Introduction
Picture courtesy: http://www.creativeagentsolutions.com/real-estate-virtual-assistant-services/social-media-management/Picture courtesy: http://www.socialmediaexaminer.com/18-social-media-marketing-tips/
5----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
What is Subjectivity and Sentiment Analysis?• Subjectivity analysis classifies content into
objective(facts) or subjective(opinions)• Sentiment Analysis classifies text polarity (positive,
negative, neutral)
Subjectivity Analysis Sentiment AnalysisSubjective Positive
Negative
Objective Neutral
6----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
What is Subjectivity and Sentiment Analysis?• Different names: Sentiment Analysis, Opinion
mining, opinion extraction, sentiment mining, subjectivity analysis,
• As a multidisciplinary field in nature, sentiment analysis encompasses the fields of natural language processing, text mining and artificial intelligence.
7----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Why is it Important?
• The proliferation of social media websites has led to the production of vast amounts of unstructured text on the Web.
• Aggregating and evaluating these opinions manually is a tedious task and could be nearly impossible.
8----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Why is it Important?
• These opinions are important for organizations (government, business) and for individuals
• “Sentiment Analysis is now right at the center of the social media research.”, Liu B.
9----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Applications
Businesses and organizations: product and service benchmarking.market intelligence.Business spends a huge amount of money to find
consumer sentiments and opinions.Consultants, surveys and focused groups, etc
Individuals: interested in other’s opinions when purchasing a product or using a service, finding opinions on political topics
Opinion retrieval/search: providing general search for opinions.
10----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Subjectivity and Sentiment Analysis of Arabic• Morphologically rich language• Formal written language: Modern Standard Arabic
(MSA)• Every day spoken language : Informal Arabic,
Colloquial Arabic, Dialectal Arabic• Previous research on SSA of Arabic was merely for
MSA, but recently researchers started addressing Dialectal Arabic (DA).
11----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Research on SSA of Arabic
• Previous survey : Korayem et. al (2012)• 30 papers were found since Korayem’s survey up to
July 2014. • Search methodology:
• a search process was performed using the following keywords: 'Arabic subjectivity and sentiment analysis', 'Arabic opinion mining', 'Comparative opinions Arabic', and 'Opinion spam Arabic' using the following databases: Google Scholar, Springer, IEEE explorer, ACM digital library, and Science Direct.
12----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
2006 2007 2008 2009 2010 2011 2012 2013 20140
2
4
6
8
10
12
Year
Num
ber
of R
efer
ence
s
13----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
SSA methods
• Supervised learning : Corpus based• Unsupervised learning : Lexicon based• Hybrid
14----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Lexicon-Based
• Sentiment Lexicons can be built:• Manually• Automatically
• Words are sometimes given scores for (positive, negative, neutral)
• Ex: SentiWordNet• How to calculate a sentence’s sentiment?• SentiStrength: takes care of negation, intensifiers,
diminshers.
15----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Corpus-Based
• Using Machine learning Classifiers and a training corpus
1S. Mohammad EMNLP 2014 Sentiment Analysis tutorial: http://emnlp2014.org/tutorials/7_notes.pdf
16----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
SSA of Arabic
• Best machine learning techniques to be used were SVM and NB.
• Stemming has led to better accuracies in most studies.
• The usefulness of n-grams varied among solutions, where in some studies unigrams were useful as well as bigrams and in others a combination of bigrams and trigrams was useful.
17----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
SSA of Arabic
• The studies spanned different genres (news comments, movie reviews, product reviews, chat turns, Tweets, Facebook posts, forum posts) and different domains.
• The conclusion of most studies is that a different solution is needed for each genre and in each domain.
18----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Use of Dialectal Arabic (DA)• Dialects differ from MSA phonologically, morphologically and
syntactically and do not have standard orthographies. This makes the task of building morphological analyzers and POS taggers for dialects a big challenge.
• Recent efforts for building these tools for DA suffer from low accuracy and are tailored for specific dialects. The availability of such tools is essential for SSA.
• Concepts have different lexical choices in different DAs which make building lexicons that cover multiple dialects very challenging. Ex: تشكيل
• Also negation and stop words can be expressed in different ways in DAs and vary among DAs.
• It has even been deliberated that the Arabic dialects can be considered different languages in their own right.
19----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Lack of Corpora and Datasets• The accuracy of any SSA system depends on the
availability of large annotated corpora which are still a scarce resource for Arabic.
• Datasets that are available now are very small compared to those available for the English language and are usually from the news domain or movie reviews.
• This also hinders the comparison of new SSA systems to previous ones to determine their accuracy.
20----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Lack of Sentiment Lexicons• No publically available DA sentiment lexicon exists. • MSA lexicons are small compared to those built for
English language. • A recent effort to build a large scale multi-genre multi
dialect Arabic sentiment lexicon has been proposed by AbdulMajeed and Diab (2014). However, it covers only two dialects: Egyptian and Levantine and is not yet fully applied to SSA tasks.
21----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• The need for Named Entity Recognition (NER)
• Although performing NER is not required to detect sentiment, in the case of Arabic language it is required because Arabic names are derived from Arabic adjectives that can be confused for sentiments, for example, the Arabic name Jamila (جميلة) which means beautiful.
22----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Handling compound phrases and idioms• Arabic speakers tend to use popular compound phrases and
idioms to express their opinions. Some examples include:• :( شيخ يا which is used to express disbelief in someone’s (ال
saying,• ( الوكيل ونعم الله which conveys a negative :(حسبي
sentiment, and carries an implicit prayer to Allah to take revenge.
• These compound phrases and idioms tend to differ throughout DAs and different Arabic cultures.
• Moreover, phrases and words used to express sentiment are subject to usage trends, with new phrases evolving every day.
23----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Use of Arabizi• A new trend in social media is the use of Latin characters
to represent Arabic words. • Arabic users of social media also tend to code switch
between Arabic and English in their writings, making it difficult to detect if a word written with Latin characters is Arabizi or English.
• The literature on Arabic SSA has not dealt with this problem yet.
24----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Sarcasm Detection • Sarcasm is a form of speech act where a person says
something positive while (s)he really means something negative or vice versa.
• Sarcasm is very hard to detect, in English, there are only few studies for sarcasm detection using supervised and semi-supervised learning approaches.
• In Arabic SSA, no study was found that takes care of sarcasm detection.
25----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Comparative Opinions • An opinion can be expressed as a comparison between
two entities. • Comparative opinions are different from regular
opinions in that they have different semantic meanings and different syntactic forms.
• Mining comparative opinions is considered a challenging task in English .
• Arabic would be no exception, however only one study has been found on mining comparative opinions in Arabic by El-Halees (2012)
26----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Opinion Spam Detection• Like in Webpages, opinions can also suffer from spam,
although opinion spam differs from web spam and therefore needs different approaches to detect it.
• Arabic opinion spam detection is still under researched. Only two studies were found.
27----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Co-reference Resolution• Co-reference resolution is a challenging problem in most
NLP applications. It is apparent in SSA also. An example illustrates the problem:
• الصور " بعض بها والتقطت جديدة كانون كاميرة اشتريتجدا جميلة "كانت
28----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Challenges
• Opinion Target and Opinion Holder Extraction
• Although the main task of SSA is to determine the polarity of the sentence, it is also essential to extract the opinion target and for some applications the opinion holder.
29----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
Conclusion
• Research on SSA of Arabic is still in its early stages, although it is gaining high interest from the research community.
• The challenges identified can all be considered future research directions
30----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu
• Thank you..• [email protected]