41

Don't you love it? Sentiment Analysis with Crowd Sourcing

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with

Crowd Sourcing

Wouter van Atteveldt et al.

2017-02-20

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

About me

� Wouter van Atteveldt ( )

� Political communication, VU Amsterdam

� MSc Arti�cial Intelligence (Edinburgh)

� PhD AI & Communication Science (VU Amsterdam)

� Research: Automatic Text Analysis, Data Analysis

� http://vanatteveldt.com

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

What is sentiment analysis?� Measure evaluative meaning of (subjective) language

� "This movie sucks" → negative

� Applications e.g.� analysing hotel reviews� automatic stock market trading� early warning systems (for brands and countries)

� Pang, B., & Lee, L. (2008). Opinion mining andsentiment analysis. Foundations and trends in informationretrieval, 2(1-2), 1-135.

� Liu, B. (2012). Sentiment analysis and opinion mining.Synthesis lectures on human language technologies, 5(1),1-167.

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Common approaches in sentiment analysis

� Human annotation

� Dictionaries of positive, negative terms

� semi-automatic expansion of dictionaries

� Machine learning

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Problems with sentiment analysis

� Evaluations are inherently subjective

� Evaluative language is creative and context-sensitive

� (even more so than factual language)

� Evaluation implies a relation, but most data/toolsundirected

� Source likes/dislikes target

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Problems with sentiment analysis

� "Terrorist Attacks: 250 Innocent Massacred By ISIS"

� "Saddam Hussein was executed by hanging"

� "This car has better mpg than my old Volvo"

� "Preacher who applauded Orlando mass killing asked torelocate"

� "Janeane Garafalo also was an interesting character ."

� "Brexit means brexit"

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Sentiment analysis: what do I want

Task de�nition

Given a piece of text, what do the author and mentionedactors think about all mentioned actors and issues

� Preacher who applauded Orlando mass killing asked torelocate

� → preacher/+/killing

� "This car has better mpg than my old Volvo"

� → author/+/thiscar, author/-/volvo

� "Terrorist Attacks: 250 Innocent Massacred By ISIS"

� → ISIS/-/innocents, author/-/ISIS

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Crowd sourcing

� Use anonymous/untrained people from Internet toperform a task

� Useful for simple tasks

� Split bigger tasks into smaller steps

� Can be very cheap (cents per unit)

� Quality control using test questoins

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Crowd sourcing

� Use anonymous/untrained people from Internet toperform a task

� Useful for simple tasks

� Split bigger tasks into smaller steps

� Can be very cheap (cents per unit)

� Quality control using test questoins

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

CrowdFlower

� Platform for easily distributing tasks

� Other platforms exist, e.g. mturk

https://make.crowdflower.com/jobs/933225/editor

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Why use Crowd Sourcing for sentiment analysis?

� Task is easy to explain

� Judgment is subjective anyway

� Low cost means multiple codings per unit possible

� Low cost means task/domain dependent analysis viable

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Why use Crowd Sourcing for sentiment analysis?

� Task is easy to explain

� Judgment is subjective anyway

� Low cost means multiple codings per unit possible

� Low cost means task/domain dependent analysis viable

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Previous work in social science

� Ken Benoit et al, Crowd-sourced text analysis, APSR2016

� Martin Haselmayer et al, Sentiment analysis of political

communication: combining a dictionary approach with

crowdcoding, QQ 2016

� Richard Socher et al, Recursive Deep Models for

Semantic Compositionality Over a Sentiment Treebank,EMNLP 2013

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Benoit et al

� Compare crowd coding to manual coding

� Policy positions in party manifestoes

� 18k sentences, 4-6 expert codings, 5-20 crowd codings

Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., &Mikhaylov, S. (2016). Crowd-sourced text analysis:

reproducible and agile production of political data. AmericanPolitical Science Review, 110(02), 278-295.

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Benoit et al

Errors when using multiple experts / crowd coders

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Benoit et al

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Haselmayer et al

� Create German sentiment dictionary using codings

� Applied to election coverage statements

� 13k sentences, 130k codings (2k euro)

Haselmayer, M., & Jenny, M. (2016), Sentiment analysis of

political communication: combining a dictionary approach

with crowdcoding. Quality & Quantity, 1-24.

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Haselmayer et al

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Socher et al

� Recursive Neural Network of sentiment

� Crowd sourced 215k phrases in 12k sentences

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Socher et al

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Socher et al

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Existing work: summary

� Promising results of crowd sourcing

� 3 use cases / approaches for crowd results

� Direct use� Build dictionary� Build statistical model

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Our Objectives

� Build sentiment tools for Dutch, English

� Compare 3 approaches

� Evaluate accuracy, costs, generalizability

ICA Panel proposal: Automatic Sentiment Analysis forCommunication Research (Wouter van Atteveldt and PabloBarbera)Using crowdsourcing for developing an attributed sentiment analysistool (Wouter van Atteveldt, Antske Fokkens, Isa Maks, Kevin vanVeenen, and Mariken van der Velden)

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Method

� Tweets and newspaper sentences about Ukraine treaty

� Coded manually and by CrowdFlower

� Compared to sentiment dictionary and Coosto

� (Coosto is a social media analytics company that doessentiment analysis)

� (n~200)

Kevin van Veenen (MA-thesis 2016), Methodologal study ofautomatic sentiment analysis in political news

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Results

Method n Reliability Correlation(kappa) (rho)

All sentences:Expert vs Dictionary 139 .38 .48Expert vs Domain Dictionary 158 .58 .58Expert vs Crowd 190 .86 .86Tweets:Expert vs. Coosto 132 .49 .56Dictionary vs. Coosto 104 .63 .68Domain dictionary vs Coosto 109 .43 .55Crowd vs Coosto 132 .52 .59

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Accuracy by number of coders

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Costs

Task N Cost codings / $Expert coding 190 87.5 2.2Crowd (15 codings) 450 10.78 42Crowd (3 codings) 480 4.02 119

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Conclusion

� Crowd Sourcing cheap and accurate

� To-do:

� Finish English task� Train machine learning / dictionary with crowd data� Error analysis / validation� Create easy tool for using results

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Other work

Some other things I am working on:

� AmCAT Text Analysis Toolkit

� NLPipe Linguistic Processing

� Clause analysis and Source Detection

� Scraping Cantonese

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

AmCAT

� Amsterdam Content Analysis Toolkit

� Open source web-based text analysis tool

� Automatic analysis, quantitative manual analysis� Multi-user, permissions per project� REST API, integration with R/python

� Setup your own server or use ours

� (experimental docker support)

� http://wiki.amcat.nl, https://amcat.nl

� http://github.com/amcat/amcat

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

NLPipe

� Natural Language Pipelining

� Easy docker-installable server for NLP

� Lemmatizing, POS tagging, parsing

� Integrated with R / Python

� Modules for Dutch, English

� http://github.com/vanatteveldt/nlpipe

Automatic Text Analysis Made Easy: Using AmCAT, NLPipe and R

to do corpus management, linguistic processing, and automatic text

analysis. Wouter van Atteveldt, Kasper Welbers, Antske Fokkens,Nel Ruigrok, Martijn Bastiaan, Christian Stuart (ICA 2017)

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Clause Analysis / RSyntax

� Detect source use and clauses

� Who does what to whom

� http://vanatteveldt.com/2016-clause-analysis/

(In press, Political Analysis)

� http://github.com/vanatteveldt/rsyntax

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Scraping Cantonese

� With Chris Fei Shen

� HK Discussion forums in Cantonese

� No segmenter exists for Cantonese

� Cannot do corpus analysis, machine learning etc.

� Scrape discusshk.com, build segmenter fromtrigrams/bigrams

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Scraping Cantonese

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.

Sentiment Analysis Sentiment Analysis: Methods and Results Other work

Conclusion

� I like automatic text analysis :)

� All tools, code available on github

� Try it out and let me know!

Slides: http://vanatteveldt.com/cityu_seminar

Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.