32
1 Identifying Emotions in Tweets related to the Brazilian Stock Market PhD thesis current status Fernando J. V. da Silva Supervisors: Ariadne M. B. R. Carvalho and Norton T. Roman IC – Institute of Computing

Identifying Emotions in Tweets related to the Brazilian Stock Market

Embed Size (px)

Citation preview

1

Identifying Emotions in Tweets related to the Brazilian Stock

MarketPhD thesis current status

Fernando J. V. da SilvaSupervisors: Ariadne M. B. R. Carvalho

and Norton T. Roman

IC – Institute of Computing

2

Content

Motivation Objectives The Plutchik Wheel of Emotions Research Methodology Current Progress Corpora Manual Annotation Preliminary Analysis of the corpora

3

Motivation

Small investors use Twitter to discuss their trading operations

#bbas3 depois acho meu post mas ainda aguardo 18,75 [#bbas3 I will find my post latter but I still wait for 18.75]

Acordo da ALLL3 melou? [Did the ALLL3 agreement is gone?]

Aí meu bolso... Bbas3 caiu pra c****** hoje[Ouch, my wallet... Bbas3 felt as (expletive) today]

Trade de venda no gráfico semanal acaba de ser acionado em BBAS3 [Weekly sell trade sign has just been detected in BBAS3]

4

Motivation

Several previous works have found correlation between tweets and stock market indexes (Bollen et al. 2011, Gaskell et al. 2013, Zhang et al. 2011)

Previous works found correlation with Emotions on tweets:

Fewer emotions indicates increases on DJIA (Zhang et al. 2011)

Calm moods are good for predicting DJIA (Bollen et al. 2011)

5

Motivation

The automatic identification of emotions on tweets could help predicting the stock market

There is no similar research for Brazilian stock market

6

Objectives

Identify emotions on Portuguese tweets

Apply the same technique to tweets related to the Brazilian stock market

7

The Plutchik Wheel of Emotions

A Psychoevolutionary theory by Robert Plutchik (Plutchik and Kellerman, 1986)

The concept of emotion is applicable to all evolutionary levels and apply to animals and humans

It defines 8 basic emotions grouped on 4 pairsJoy vs SadnessFear vs AngerTrust vs DisgustSurprise vs Anticipation

8

Research Methodology Use machine learning techniques similar

to (Suttles and Ide, 2013) with one classifier for each pair of opposite emotions:

− Joy vs Sad− Anger vs Fear− Trust vs Disgust− Anticipation vs Surprise

If probability to be any of these classes is too small, then a tweet is classified as “Neutral”

9

Research Methodology (cont.)

Train the algorithm using a bigger corpus of “context-free” tweets

Test using a “specific-context” corpus of stock market related tweets

Use a SVM tree kernel such as in (Agarwal et. al., 2011) to compare tweets structure instead of words frequencies

10

Research Methodology (cont.)

Tree representation sample for tweet “@Fernando this isn't a great day for playing the HARP! :)” - from (Agarwal et. al. 2011)

11

Research Methodology (cont.)

Manual annotation

Emotion Identification using Machine Learning with SVM tree kernel

Emotion Identification using Machine Learning and n-grams attributes (Benchmark)

Tweets collection

12

Research Current Progress

Manual annotation(In Progress)

Emotion Identification using Machine Learning with SVM tree kernel

(To Do)

Emotion Identification using Machine Learning and n-grams attributes (Benchmark)

(To Do)

Tweets collection(Done)

13

Corpora (cont.)

Specific-Context corpus: 2,402 non-repeated tweets containing one of the 73 IBOVESPA stock market codes (i.e. petr4 for Petrobras, bbas3 for Banco do Brasil, etc)

− Manually annotated by 2 people

14

Corpora

Context-Free corpus: 26,407 non-repeated tweets automatically collected from Twitter and automatically annotated according to hashtags (Distant Supervision). Ex:

− #feliz (happy) → joy tweet− #triste (sad) → sad tweet

15

Manual Annotation Process

Process inspired by (Suttles and Ide 2013), identify emotions according to Plutchik's wheel of emotions

Each tweet is marked with up to 4 emotions or neutral (joy or sadness, anger or fear, trust or disgust, anticipation or surprise)

16

Manual Annotation Process (cont.)

A simple command line tool was developed to help on the annotation

17

Preliminary Analysis of the Corpora

Using word frequencies to help answer some questions:

− Do tweets really differ in opposite pairs of emotions?

− How similar are tweets with the same emotion in different corpora?

− Can EmoLex (Mohammad, 2013) terms help identify emotions?

18

Annotations

19

Do tweets differ in pairs of emotions?

Joy Sad

Trust Disgust

Anger Fear

Anticipation Surprise

Context-Free corpus

20

How similar are tweets in different corpora?

Joy Sad Anger Fear

Con

text

-Fre

e co

rpus

Spe

cific

-Con

text

cor

pus

21

How similar are tweets in different corpora?

Trust Disgust Anticipation Surprise

Con

text

-Fre

e co

rpus

Spe

cific

-Con

text

cor

pus

22

Can EmoLex terms help on emotion identification?

What is EmoLex?− Research by (Mohammad, 2013)− 14,182 unigrams (words) associated to

emotions− Manually created by crowdsourcing− Available in 20 languages (including

Portuguese)

23

Can EmoLex terms help on emotion identification?Context-Free corpus

24

Can EmoLex terms help on emotion identification?Context-Free corpus

25

Can EmoLex terms help on emotion identification?Context-Free corpus

26

Can EmoLex terms help on emotion identification?Context-Free corpus

27

Conclusions

Tweets annotated with opposite emotions differ on their most frequent words

But tweets with the same emotion don't share their most frequent words on the two corpora

EmoLex terms' frequencies vary according to emotion and may be usefull as attributes

28

Next Steps

Develop a web-based tool for a “crowdsourcing” annotation

Conduct machine learning experiments for emotion identification using n-grams as attributes – To be used as a benchmark

Create tree representations for the tweets

Conduct experiments using tree representations for emotion identification using a SVM tree kernel as in (Agarwal et. al. 2011)

Compare results in the two corpora

29

Thank you!Questions?

30

References (Agarwal et. al. 2011) Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau.

Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, pages 30–38. Association for Computational Linguistics, 2011. 7, 13

(Bollen et. al. 2011) Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8, 2011.

(Gaskell et. al. 2013) Paul Gaskell, Frank McGroarty, and Thanassis Tiropanis. An investigation into correlations between financial sentiment and prices in financial markets. In Proceedings of the 5th Annual ACM Web Science Conference, pages 99–108. ACM, 2013.

(Mohammad, 2013) Saif M. Mohammad and Peter D. Turney. Crowdsourcing a word-emotion association lexicon. 29(3):436–465, 2013.

(Suttles and Ide 2013) Jared Suttles and Nancy Ide. Distant supervision for emotion classification with discrete binary values. In Computational Linguistics and Intelligent Text Processing, pages 121–136. Springer, 2013.

(Plutchik and Kellerman 1986) Robert Plutchik and Henry Kellerman. Emotion: theory, research and experience. Acad. Press, 1986.

(Zhang et. al. 2011) Xue Zhang, Hauke Fuehres, and Peter A. Gloor. Predicting stock market indicators through twitter “i hope it is not as bad as i fear”. Procedia - Social and Behavioral Sciences, 26(0):55 – 62, 2011. The 2nd Collaborative Innovation Networks Conference - COINs2010.

31

Emotions and BehavioursStimulus Event Cognite appraisal Subjective

reactionBehavioral reaction

function

threat “danger” Fear escape safety

obstacle “enemy” Anger Attack Destroy obstacle

Gain of valued object

“possess” Joy Retain or repeat Gain resources

Loss of valued object

“abandonment” Sadness Cry Reattach to lost object

Member of one's group

“friendship” Trust groom Mutual support

Unpalatable object

“poison” Disgust vomit Eject poison

New territory “examine” Anticipation map Knowledge of territory

Unexpected event

“what is it?” Surprise Stop Gain time to orient

32

Analogy to stock market investors

Evento Estímulo Cognição Estado Emocional

Comportamento manifestado

Efeito

ameaça à uma ação observada

“risco de prejuízo” Medo Fugir (vender ação ou não comprar)

Segurança

Obstáculo (para o rendimento de uma ação)

“inimigo que causa prejuízo”

Raiva se possível, agir contra, senão apenas indignar-se

Destruir ou contornar obstáculo

Ganhar lucros “possessão” Alegria Reter ou repetir Ganhar recursos

Prejuízo “prejuízo” Tristeza lamentar Compensar prejuízo

Investidor ou empresa de confiança

“amizade” Confiança Seguir conselhos ou comprar ações da empresa

Ajuda para tentar obter lucros

Cenário ruim ou Ação com desempenho muito ruim

“risco de prejuízo” Desgosto Manter distância Reduzir riscos

Novo cenário (mudança de preço esperada)

“examinar” Antecipação Operar de acordo com o previsto

Conhecimento do cenário futuro

Evento inesperado “o que é?” Surpresa parar Ganhar tempo para se orientar