47
Tweet as a Tool for Election Forecast: UK 2015 General Election as an Example DRAFT * Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 Abstract In this essay we explore the utility of using Twitter conversations to explain election outcomes. Our efforts are based on a large corpus of Tweets collected during the six-month period approaching the 2015 UK election. Our analysis in- cludes the geo-location of tweets, sentiment analysis, and issue/topic modelling. The analysis focuses on England and Scotland. The interpretation of the Twitter landscape of Scotland is straightforward: the Scottish National Party dominated the Twitter conversation on all aspects including the number of tweets, general sentiment, and the evaluation of key issues. In contrast, the results of England are more nuanced: Generally speaking, the Labour Party was the slight favorite; however, the Conservative Party had a slight edge over Labour on certain issues that may have been critical in the ultimate Tory victory. We suspect this poor performance of EnglishTweets in explaining election outcomes is the consequences of the population of Twitter users being unrepresentative of the country’s popula- tion. We discuss the strategy to remove the biases through collecting and analysing demographic information of Twitter users. * Paper prepared for Presentation at the Third Annual Meeting of the Asian Political Methodology Society in Beijing January, 2016 1

Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

Tweet as a Tool for Election Forecast: UK 2015General Election as an Example

DRAFT ∗

Philipp Burckhardt Raymond Duch Akitaka Matsuo

January 3, 2016

Abstract

In this essay we explore the utility of using Twitter conversations to explainelection outcomes. Our efforts are based on a large corpus of Tweets collectedduring the six-month period approaching the 2015 UK election. Our analysis in-cludes the geo-location of tweets, sentiment analysis, and issue/topic modelling.The analysis focuses on England and Scotland. The interpretation of the Twitterlandscape of Scotland is straightforward: the Scottish National Party dominatedthe Twitter conversation on all aspects including the number of tweets, generalsentiment, and the evaluation of key issues. In contrast, the results of Englandare more nuanced: Generally speaking, the Labour Party was the slight favorite;however, the Conservative Party had a slight edge over Labour on certain issuesthat may have been critical in the ultimate Tory victory. We suspect this poorperformance of EnglishTweets in explaining election outcomes is the consequencesof the population of Twitter users being unrepresentative of the country’s popula-tion. We discuss the strategy to remove the biases through collecting and analysingdemographic information of Twitter users.

∗Paper prepared for Presentation at the Third Annual Meeting of the Asian Political MethodologySociety in Beijing January, 2016

1

Page 2: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

2

1 Introduction

The UK General Election 2015 was a nightmare for polling agencies: Before the elec-

tion, all of them predicted the election would generate a hung parliament and the media

focused on likely coalition formation negotiations that would occur after the election.

The election results turned out that the Conservative Party secured the single-party ma-

jority and formed the new government.1 Academics had also a hard time in predicting

the election results from polling information. A forthcoming special issue of Electoral

Studies is a collection of election predictions submitted by scholars before the election.

Among them, none has predicted the single-party majority of the Conservatives (Fisher

and Lewis-Beck, 2015).

In this essay we explore the utility of using Twitter conversations to explain election

outcomes. Our efforts are based on a large corpus of Tweets collected during the six-

month period approaching the 2015 UK elections. Our analysis includes the geo-location

of tweets, sentiment analysis, and issue/topic modelling. The analysis focuses on England

and Scotland. The interpretation of the Twitter landscape of Scotland is straightforward:

the Scottish National Party dominated the Twitter conversation on all aspects including

the number of tweets, general sentiment, and the evaluation of key issues. In contrast,

the results of England are more nuanced: Generally speaking, the Labour Party was the

slight favorite; however, the Conservative Party had a slight edge over Labour on certain

issues that may have been critical in the ultimate Tory victory. These results are not

consistent with the actual election outcomes; they point to the need for a bias correction

methodology which we are developing and describe in this essay.

1For instance, http://fivethirtyeight.com/datalab/what-we-got-wrong-in-our-2015-uk-general-election-model/,http://www.economist.com/news/britain/21651250-why-opinion-polls-went-wrong-pollderdash

Page 3: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

3

2 Literature

Because of its abundance and accessibility, micro-blogging data, in particular tweets,

have been used to measure public sentiment about a range of issues. Employing Twitter

conversations to gauge the public’s political sentiment is an illustration (OConnor et al.,

2010); as are efforts to predict political affiliations (Conover et al., 2011). Predicting elec-

tion outcomes is another one of these areas (c.f. Gayo-Avello, 2013). Previous attempts

to predict election outcomes have had mixed success. Some articles claim that tweets or

other social media texts have a decent predictive capability of election outcomes. Tumas-

jan et al. (2011) analyzes party and party leader mentions in the 2009 German federal

election and demonstrates that the number of their mentions, combined with sentiment

analysis, has high accuracy in election predictions. Curini, Ceron and Iacus (2013); Ceron

et al. (2013); Ceron, Curini and Iacus (2014) attempt to analyze several elections in Italy,

France, and the US and show that a supervised learning method developed by Hopkins

and King (2010) does a good job of explaining fluctuation in party or candidate support

in various contexts; they also predict election outcomes.

There are host of articles challenging these optimistic findings about the use of social

media texts in predicting election outcomes. Jungherr, Jurgens and Schoen (2012) is

a rebuttal to the findings by Tumasjan et al. (2011). They argue that the predictive

accuracy in Tumasjan et al. (2011) is an artifact of how parties are selected for inclusion

in the analysis. They argue that once all parties are included in the data the predictive

power of tweet data becomes essentially zero. Gayo-Avello (2013), in a comprehensive

review of the literature, argues that there are numerous limitations to efforts to the

empirics that use Twitter to predict election outcomes.

There have been some attempts to analyze social media texts in the UK politics.

Lampos, Preotiuc-Pietro and Cohn (2013) analyze the 2010 General Election showing

that when the polling results of voting intention are used as training data, employing

Page 4: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

4

structured supervised learning methods, the tweets can predict vote intentions at the

twitter account level. Boutet, Kim and Yoneki (2013) also analyze tweets during election

periods; they use the network of tweet-retweet relations to estimate the likely supporters

of major parties among Twitter users in the UK.

This project is another attempt to predict election outcomes employing Twitter con-

versations. Our approach which focuses on volume and sentiment in social media texts

resembles Tumasjan et al. (2011). In addition, we put a significant emphasis on the anal-

ysis of the issues discussed in the campaign period. Our approach is explorative: We first

attempt to track the change in popularity of parties through the analysis of all tweets

mentioning major parties or party leaders. We then shift our focus to the analysis of

issues discussed in the tweets. Previous attempts have put relatively little emphasis on

the issues that shape election campaigns.

3 Methodology for Downloading Tweets

We have collected the corpus of tweets using the Twitter streaming API.2 The stream-

ing API allows us to connect to the global stream of Twitter data. A program using the

API maintains a continuous connection to the Twitter server; tweets that satisfy pre-

defined conditions are downloaded automatically. We use very simple criteria to select

tweets: Party names and leader names are used as the search terms. The names of the

six largest parties are used as the search terms. Table 1 presents list of parties and search

terms. The download started December 21, 2015. The last date of tweets we use in the

data analysis is May 6, 2015, a day before the General Election.3 In the five and a half

month period from the late December 2014, we have downloaded 25 million tweets. To

save on computation time we randomly sampled 8 million tweets for our data analysis.

2https://dev.Twitter.com/docs/streaming-apis3Because of technical problems, tweets of three days are incomplete (February 20, 21, and March 13).

These dates are excluded from the final analysis.

Page 5: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

5

Table 1: Search Terms

Party Party Name Terms Party Leader Terms

Conservative Party conservatives, tories, torys, tory davidcameron, david cameron,cemeronmustgo, cameronmust-stay

Liberal Democrats lib dem, liberal democrats, libdem, lib-dems

nick clegg, nickclegg

Labour Party labour, labours, uklabour ed miliband, edmiliband

UKIP ukip, uk independence party Nigel Farage

Scottish National Party snp, scottish national party, scottishna-tionalparty

Nicola Sturgeon, NicolaStur-geon

Green Party thegreenparty, green party Natalie Bennett

4 Tweet Classification

We attempt to classify these three million in several categories based on the geo-

location of users, the sentiments expressed in tweets, and the electoral issues raised.

Because of the vagueness of search term, not all of these tweets actually discuss UK

election: and there are many tweets created by Twitter users in the location outside the

UK. In order to use only the relevant tweets, we exclude tweets that cannot be geo-located

in the UK. Another crucial issue is the sentiment classification of tweets. For the purpose

of election prediction using tweets, sentiment expressed in tweets can be of considerable

importance. In this paper, we use a recently developed sentiment classifier. The issue

classification is conducted by simple pattern matching.

4.1 Geo-locating Tweets

Because of the regional variations of party systems in the UK, identifying the geo-

location of tweets is important for this project. Also non-UK Tweets need to be removed

from the dataset as the duplication of party names across countries, such as Green Parties

Page 6: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

6

in many countries. Twitter has a geo-location feature that provides geocodes of each

Tweet, but the function is an opt-in feature which only gets turned on with the explicit

consent of users. Because of privacy concerns, the proportion of opted-in users in the

Twitter community is too low to use it as a reliable source. In one estimate, only two

percent of Twitter users turn on this feature and the proportion is similar in our data.4

To supply the geo-location information, we use the location field of Twitter users,

which is a field in which Twitter users can provide their location information. We identify

the location using geonames.org API, that provides a service to recover the geolocations

from partial addresses.5 If the location field of a Twitter account includes a name of a

town or city, we subsequently invoke geonames.org API to determine the geo-location.

Through the analysis of location names we have identified 3,776 different cities or counties

in the UK. With these locations, about 3,217,147 tweets are given a geo-location in the

UK. Table 2 shows the top 20 locations included in the data. In the following analysis

we locate Twitter users in English and Scottish counties; and we run separate analyses

for the two regions of the UK.

4.2 Sentiment Analysis

Analyzing the sentiment of text is a developing area in natural language process-

ing. The goal of sentiment analysis is to develop an algorithm to identify whether a

text is subjective or objective, and if it is subjective, then to categorize whether it is

positive or negative. Broadly speaking, there are two different approaches for the classi-

fications (See an extensive review by Pang and Lee, 2008; Paltoglou, 2014). The first is

the machine-learning approach, using some classification method such as Support Vector

Machine, Naive Bayes, or Maximum Entropy. Computer linguists have proposed a range

of machine-learning strategies and compete amongst each other for classification accuracy

4http://firstmonday.org/article/view/4366/36545http://www.geonames.org/export/web-services.html

Page 7: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

7

Table 2: Geo Locations of Tweets

City Name Country Count

London England 1131036Scotland Scotland 300096Glasgow Scotland 237046Edinburgh Scotland 133112Manchester England 102259Liverpool England 65090Bristol England 50637Wales England 49482Sheffield England 48824Birmingham England 43418Leeds England 42915East Midlands England 39821Aberdeen Scotland 38095Nottingham England 37499Brighton England 37161Newcastle upon Tyne England 36369East Yorkshire England 32676Cambridge England 29006Norwich England 23287Oxford England 22596

(e.g. Wilson et al., 2013). The main data used in this approach is document-term ma-

trices which are very similar to topic modeling. This approach requires a set of training

data that is usually created by human-coders, although some research makes creative

use of the characters of online texts (e.g. Pak and Paroubek, 2010). Another approach

is an unsupervised, lexicon (dictionary)-based approach. The algorithms used in this

second approach search through texts and find specific terms in precompiled dictionaries.

Developing such a dictionary requires significant efforts. However, this approach works

better if the algorithm and dictionary are developed specific to the domain of texts. In

addition, this approach allows the analysis of sentence structures which is difficult in the

machine-learning approach.

In this essay, we use a lexicon based algorithm called VADER developed in Hutto and

Gilbert (2014). VADER is a sophisticated text-parsing algorithm that classifies tweets

based on sentiment. In their paper Hutto and Gilbert (2014) compare the effectiveness

Page 8: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

8

of their algorithm with other sentiment classification algorithms including LIWC, Senti-

WordNet, and machine-learning methods using Naive Bayes and Support Vector Machines

(SVM). Their classifier is based on a gold-standard lexicon list including words and other

features (e.g. emoticons and word-capitalization). These elements of the lexicon list are

then combined with rules that detect grammatical and syntactical conventions that typi-

cally modify sentiment intensity through either emphasis or negation. Their comparisons

show that this simple algorithm exhibits better accuracy than the other methodologies.

The critical advantage of this method along with other lexicon based methodologies (e.g.

Paltoglou and Thelwall, 2012) is their computational efficiency.

We applied the Hutto and Gilbert (2014) VADER algorithm to our corpus of UK Tweets.

For the three-million tweets in our Twitter corpus, the classifier finishes the sentiment

analysis within two minutes. The resulting classification output includes four outputs:

neutral, negative, and positive sentiments, along with a compound measure of sentiment.

Figure 1 is the distribution of compound sentiments. Following the advice of Hutto and

Gilbert (2014), we categorize tweets with compound scores greater than 0.05 as positive

sentiment tweets and tweets with compound score less than -0.05 as negative sentiment

tweets. As Figure 1 illustrates the distribution of sentiment in our UK Tweets is trimodal.

Clearly the modal sentiment is neutral. The negative and positive sentiments scores are

normally distributed, roughly, around -.5 and +.5, respectively.

Page 9: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

9

Figure 1: Sentiment Classification Result

0e+00

5e+04

1e+05

−1.0 −0.5 0.0 0.5 1.0Compound Measure

Cou

nt

Page 10: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

10

4.3 Issues

We are also interested in identifying the issues that dominated the Twitter election

conversations. Accordingly we also conduct issue classification using iterations of a simple

pattern matching strategy. First we selected important issues in the election based on

archival searches of news articles covering the election campaign and manifesto summaries

provided by news agencies.6 From this analysis we identified six key issues in the election:

economy, immigration, tax, welfare, EU, and education. For each issue, we first search

through the corpus of tweets with pre-defined search terms. We use the tweets that are

matched with each issue in order to create a document-term matrix. Manually we then

determine which terms are frequently used, and therefore highly relevant, for each of the

issue. These relevant terms are then added to the original search terms; we then conduct

another round of issue searches. We iterated this process several times until we finalized

the list of search terms. Table 3 shows the list of search terms along with the number of

tweets matched with the terms.

Table 3: Issue Search Terms

Topic Search Terms Count

Economy deficit, economy, business, austerity, budget,debt, borrowing, gdp, unemployment, job(s, )

423082

Education education, tuition, school, university, universi-ties, apprenticeship, childcare, teachers, uni

153582

EU proeu, no2eu, meps, brexit, mep, EU, MEP 158130

Immigration immigration, racist, immigrant, migrants,boarder(s, )

162634

Tax tax, betroom, mansion, dodging, nondom, VAT,IFS

291107

Welfare welfare, NHS, benefit, wowpetition 416160

6Examples:http://www.bbc.co.uk/news/election/2015/manifesto-guide

http://www.theguardian.com/politics/manifestos-2015

Page 11: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

11

5 Tweet Heterogeneity

The first stage of this Twitter project has consisted of obtaining twitter conversations

related to the UK general election and conducting topic modelling and sentiment analysis.

As we will demonstrate in the following analyses, the content of Twitter traffic does

not necessarily reflect the general preferences and sentiment of the population – so we

can think of Twitter traffic as being a somewhat biased read of public preferences and

sentiment. Accordingly, in order to generate estimates of population parameters (such

as support for the U.K. Conservative Party) using Twitter conversations, we need to

understand, precisely, the nature of this bias. This is a second stage of the project that

we are currently undertaking. The second stage consists of getting a better measure of

the heterogeneity in twitter conversations and consists of three principal components.

5.1 Identifying Heterogeneous Preferences in the Twitter Sub-

population

A first stage of our effort to estimate heterogeneity in Twitter conversations relies

on survey data that first allows us to identify who actively tweets and secondly to cor-

relate this activity with socio-demographic and political sentiment variables. We gain

two critical insights from this analysis. First, we learn the socio-demographic profiles of

both Twitter users and those who do not tweet – for example their age, income, educa-

tion, and gender distributions. This in itself will be very valuable for weighting tweets

so that they better reflect the preferences of the overall population. Secondly, we also

learn how relevant political preferences vary across socio-demographic groups within both

the Twitter and non-Twitter sub-populations. These two insights from the analysis of

survey data are the basis for constructing propensity score weights. These will indicate

two potential sources of bias. First, the socio-demographic distributions of Twitter users

Page 12: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

12

may differ from the overall population and hence weighting can correct for this distribu-

tional bias. But, secondly, it may also be the case that within any socio-demographic

grouping (lets say the middle-income, white, low educated segment) Twitter users may

have distinct political preferences from non-Twitter users and hence then also from the

overall population.

The results from this analysis of large-n survey data will inform the propensity score

weights, for socio-demographic segments, that should correct for any bias in the overall

political preferences and sentiment that is being registered by Twitter users.

5.2 Socio-demographic Segmentation of Tweets

In order to accomplish this weighting task we need information on the socio-demographic

composition of the Twitter users in our database – these distributions can then be com-

pared to the overall population. The proportion of educated and rich individuals in the

Twitter sub-population is likely higher than their proportions in the overall population

(or lets say the population of registered voters). We would like to be able to segment the

Twitter conversations we collect into at least very rough socio-demographic groupings.

These groupings would then be weighted to their appropriate population distribution us-

ing the information obtain from the data collection and analysis described in the previous

section.

Generating socio-demographic profiles of Twitter users has attracted considerable

attention from computational linguists and computer scientists. There are a number of

strategies implemented in this regard although they typically are based on extracting

demographic information from user profiles; using this information to generate socio-

demographic segmentation of Twitter users; and then identifying distinguishing latent

patterns in the Tweets of these different socio-demographic groupings. These latent

patterns are then used to categorise all twitter users in our database (i.e., out of sample

Page 13: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

13

predictions).

We briefly describe recent efforts to estimate the demographic profiles of Twitter

users. Most of these efforts use a three-step methodology: First, they collect concrete

information about a subset of Twitter users by analysing the texts in various fields of

Twitter accounts (e.g. user description) as well as other identifiable information on the

user (e.g. information on the user obtained from other social networks such as LinkedIn).

Secondly, they develop the classification machine using other user information (e.g. num-

ber of followers, number of lifetime tweets, content of tweets). And finally they classify

user accounts which do not provide any concrete information regarding the demographic

characteristics of interest. Preotiuc-Pietro, Lampos and Aletras (2015), for example,

extract occupation information from Twitter user profiles and conduct text analysis to

categorise users into occupational classes. Sloan et al. (2015) approaches the identifica-

tion of occupational groupings in a similar fashion although they use human coders to

validate machine occupational classifications. Others have focused on Twitter text anal-

ysis to identify distinct patterns that isolate the Twitter user’s gender and age (Schwartz

et al., 2013). Burger et al. (2011) also propose a method for identifying the gender of

Twitter users.

We propose to employ a similar strategy of using information regarding Twitter users

in order to categorise them into occupational, gender and education categories. In addi-

tion to using information from Twitter user profiles we will also supplement this informa-

tion with information about the Twitter user obtained from their profiles on other social

media such as Facebook and Instagram.

As is the case with these other efforts such as Preotiuc-Pietro, Lampos and Aletras

(2015) our intention is to categorise the Twitter users into distinct socio-demographic

categories – for example high income, high education, women working in a managerial

position. Having accomplished this we would then attempt to identify unique patterns

Page 14: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

14

in their Tweets that could be used to categorise “out-of-sample” Tweets.

5.3 Geo-location Segmentation of Tweets

We noted earlier that we have developed a strategy for geo-locating Twitter accounts.

This information can also be used for socio-demographic segmentation of Twitter users.

Mohammady and Culotta (2014) propose a strategy for categorising Twitter users into de-

mographic groupings by matching Twitter accounts with demographic information from

their county geo-coded information available from the Twitter API. The machine analy-

sis of linguistic patterns then occurs at the county-level. These county-level analyses are

designed to identify the socio-demographic grouping of each Twitter user.

5.4 Propensity Score Weighting of Tweets

The analyses described above will ultimately associate propensity scores with each

of the Twitter accounts in our database – so for example, a Twitter user would have

a likelihood of falling in any one of the socio-demographic cells we ultimately create as

part of the heterogeneity measure strategies described above. So for example, a Twitter

user would have a .37 likelihood of being a poorly educated, male from Northern Ireland.

The same Twitter user would have a .07 likelihood of being a highly educated, woman

from Brighton. We also have a corpus of Tweets classified according to partisanship and

political sentiment. Using them we can obtain probabilistic measures of partisanship of

each user in our data, such as a user has .1 chance of being a Labour supporter, .5 chance

of a Conservative supporter, and so on. We can also calculate sentiment towards Party

leaders and particular issues.

By combining these two measures associated with each Twitter user, a matrix of de-

mographic predictions for a user account and a vector of partisanship predictions, we can

calculate the estimated probabilities of voting directions in each of these demographic

Page 15: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

15

cells. For example, the prediction might be that a Scottish male in his forties work-

ing as an engineer has a twenty percent chance of voting for Labour, a forty percent

chance of voting for SNP, etc. To obtain the overall propensity of the voting in each

geographic regions, we will calculate a weighted average of voting directions using the

actual distribution of each of these cells in the U.K. population. This is the strategy we

are developing for the fine-grained election forecast. We now turn to our estimation of

political sentiment and our efforts to use these data to predict the outcome of the 2015

U.K. general election.

6 Preliminary Results

This section presents the results from the first stage of our project; specifically, our

preliminary analysis of the content of Twitter conversations. Our initial analysis is an

exploratory effort to understand whether our Twitter corpus can help us describe, and

possibly explain, election outcomes. Our analysis is mostly descriptive. We will show a

series of graphical aggregations of tweets and discuss our interpretation of them.

6.1 Tweet Counts

Figure 2 is a simple count of tweets. Until the end of March, the plot is more or less

stable, but starting in early April the number of tweets shows a clear upward trend as

the election date approaches. In addition, there are several spikes in April corresponding

to the election debates by party leaders on April 2, 16, and 30.

Page 16: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

16

Figure 2: Number of Tweets

0

5000

10000

Jan Feb Mar Apr MayDate

Num

ber

of T

wee

ts

Page 17: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

17

Table 4 provides a breakdown of tweets by the number of parties mentioned. There are

some tweets which do not mention any party names or mention multiple party names. In

the following plots, we only use tweets with one party name. Figure 3 shows the number

of tweets by parties. For Scotland the plot clearly indicates that SNP dominates all other

parties especially after March, 2015. Until February, Labour and SNP had about the same

number of tweets. However from March, the number of SNP tweets was clearly larger

than Labour, and this SNP dominance became salient after the first election debates on

April 2.

English tweets present a less clear picture. In April, three parties, Conservatives,

Labour and UKIP, are mentioned quite frequently compared to the three other parties.

Although the number of tweets for the three parties fluctuates together, there is a Conser-

vatives spike on April 14 when the Conservatives published their manifesto. Toward the

election date, it seems that Labour Party Twitter volume exceeded Conservative volume.

Twitter volume on its own provides no hint of a Conservative victory.

Table 4: Party Mentions in Each Tweet

Number of Parties Mentioned Count

0 14027581 55776832 10511533 1535984 275995 103236 3727

Page 18: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

18

Figure 3: Number of Tweets by Party

0

10000

20000

30000

40000

50000

0

5000

10000

15000

0

2000

4000

6000

8000

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 19: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

19

Our expectation is that fluctuations in Twitter sentiment will better reflect the actual

election outcome. Figure 4 presents a plot of our Twitter sentiment measure for each of

the political parties. It would appear that Labour Party sentiment was actually quite

positive, and certainly more positive than Tory sentiment, in the final month leading up

to the election.

Figure 4: Number of Positive Tweets by Party

0

5000

10000

15000

20000

0

2000

4000

6000

0

1000

2000

3000

4000

5000

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 20: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

20

The balance of positive versus negative Twitter sentiment should be more informa-

tive. We generate a measure of net differences in sentiment (positive-negative). Figure 5

summarises the net difference in sentiment for each of the parties. This clearly confirms

that the Conservative Party was on balance negatively perceived at least in Twitter con-

versations. And the Labour Party scores positively on our net difference measure and

this positive net score rises as the election approaches.

Page 21: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

21

Figure 5: Net Positive-Negative Tweets by Party

0

5000

−1000

0

1000

2000

3000

0

1000

2000

3000

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 22: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

22

6.2 Party Leader Mentions

We create similar Twitter volume and sentiment plots for the party leaders. Figure 6

presents the Twitter volume plots. Again, as was the case for the party plots, in Scotland,

Twitter volume for the SNP Leader, Nicola Sturgeon, overwhelmed those for the other

party leaders. In England, Conservative Leader David Cameron and Labour Leader Ed

Miliband have similar Twitter volume over the course of the election period. There are

some notable differences between party and party leader mentions: UKIP Leader Nigel

Farage was mentioned much less than leaders of two larger parties and there is no obvious

Conservative manifesto spike for Cameron’s Twitter volume.

Page 23: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

23

Figure 6: Number of Tweets by Party Leader

0

10000

20000

30000

0

2000

4000

6000

8000

0

3000

6000

9000

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Cameron Clegg Miliband Sturgeon Farage Bennett

Page 24: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

24

There is more variation in the Twitter conversation results for the leader series. Figure

7 presents the Twitter volume plots. In the early part of our period there is some evidence

that Cameron has a positive lead over Miliband but as the election approaches Milibands

positives exceed those of Cameron.

Figure 7: Number of Positive Tweets by Leaders

0

5000

10000

15000

20000

0

1000

2000

3000

0

2000

4000

6000

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Cameron Clegg Miliband Sturgeon Farage Bennett

Page 25: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

25

Figure 8 presents the difference in positive versus negative tweets regarding each of

the leaders. These graphs suggest that on balance tweets regarding Miliband were more

positive than was the case for Cameron. And certainly as the election approaches we see

a very net Miliband advantage in Twitter conversations.

Figure 8: Net Positive-Negative Tweets by Leaders

−5000

0

5000

10000

15000

−1000

0

1000

2000

0

2000

4000

6000

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Cameron Clegg Miliband Sturgeon Farage Bennett

Page 26: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

26

6.3 Cumulative Tweet Counts

In the previous two subsections we assessed the time-series of the count of tweets pub-

lished on each day. These time-series reflect day-to-day fluctuation of tweets responding

to the political events of the day, such as the election debates held on 2, 16, 30 April 2015

and publication of election manifestos. These time-series of raw counts with high fluctua-

tion are suitable for understanding the direction of trends in opinion change but may not

be useful for knowing the overall picture of the election landscape. In this subsection, we

show instead the plot of counts of the tweets for parties and leaders accumulated from

the start of our tweet collections. This method of using cumulative counts of tweets has

proven to be an effective method for twitter election forecast in previous studies (e.g.

Caldarelli et al., 2014; Burnap et al., 2015).

Figures 9 - 11 are the plots of ratios of cumulative tweets. To generates the plot, we

first calculate the number of all tweets mentioning each party or leader from the start

of download period to the date in the plot, then calculate the proportion of tweets by

dividing the number for each party or leader by the total counts of tweets. Figure 9 is

the plot of party mentions, Figure 10 is the plot of party leader mentions, and Figure 11

is the plot of party and leader mentions.

The cumulative counts for the parties and leaders in Scotland are close to the actual

election results7 although the tweets mentioning UKIP comprise the larger proportion

than the actual number of votes they received. However, for England, the counts do not

necessarily correspond to the actual election results: Labour is ahead of Conservatives;

UKIP and Greens are overrepresented in the Twitter world.8 As our analysis of opinion

poll data in Section 7 indicates, the Twitter population in the UK is not a representative

sample of the general UK population, and in order to obtain reliable assessment of opinion

7The vote percentage in Scotland for each party is SNP: 50.0, LAB: 24.3, CON: 14.9, LD: 7.5, UKIP:1.6, GRN: 1.3.

8The vote percentage in England is CON: 41.0, LAB: 31.6, UKIP: 14.1, LD: 8.2, GRN: 4.2.

Page 27: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

27

change in the UK from tweets, we need to deal with this bias through the strategy we

mapped out in Section 5.

Figure 9: Ratio of Cumulative Positive Tweets for Parties

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.5

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 28: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

28

Figure 10: Ratio of Cumulative Positive Tweets for Leaders

0.0

0.2

0.4

0.0

0.2

0.4

0.0

0.2

0.4

0.6

0.8

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Cameron Clegg Miliband Sturgeon Farage Bennett

Page 29: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

29

Figure 11: Ratio of Cumulative Positive Tweets for Parties and Leaders

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.0

0.2

0.4

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 30: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

30

6.4 Issues

As the election day approached its clear that both the Labour Party and its leader

were generating more positive Twitter traffic than was the case for the Conservatives.

While the Conservatives clearly struggled in terms of Party and leader image, they did

better with respect to the issues that dominated the 2015 General Election. Figure 13

plots the lines from local regression smoothing (LOESS) for each issue. Three issues

received much of the Twitter attention: Economy, Welfare, and Taxes. In England,

Welfare and the Economy are the top issues, while in Scotland, the Economy seems the

primary concern after March.

Page 31: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

31

Figure 12: Number of Issue Tweets

0

2000

4000

6000

8000

0

1000

2000

3000

0

200

400

600

All Tw

eetsE

nglandS

cotland

Jan Feb Mar Apr May

Issue economy immigration tax welfare EU education

Page 32: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

32

Our analysis of Twitter traffic can provide some insight into which parties are favoured

by any one of these major issues that dominated the election conversation. Figures 13

and 15 show the LOESS lines for issue tweets that are associated with each of the UK

parties. We count the number of tweets for each party by combining mentions of both

party names and party leaders. In England, UKIP clearly owned two issues, immigration

and EU. Over a half of tweets on these issues mentioned UKIP.

Welfare was one of the dominant issues during the campaign. The Conservatives were

favored in terms of the frequency of welfare tweets over the course of the entire election

period. Two other issues dominated public attention: the economy and taxes. Early in

the campaign, the frequency of Conservative and Labour association with these issues

was similar. But as the election day approached we see the Conservative association with

these issues advances considerably over Labour. Clearly in terms of Twitter engagement

on major issues, the Conservatives had an advantage over the Labour Party.

Tweets associating the parties with these issues can be both positive and negative.

Figure 14 summarises the frequency of positive issues tweets for the major parties. On the

major economic issues, the economy and taxes, the Conservatives lead the Labour party

both in frequency overall but also in terms of positive tweets. While the Conservatives

have a lead on frequency of welfare tweets throughout our sample period, Labour, has an

advantage on positive welfare tweets as the election day approaches.

Page 33: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

33

Figure 13: Number of Issue Tweets by Parties (England)

economy education EU

immigration tax welfare

0

250

500

750

−100

0

100

200

0

100

200

300

0

200

400

0

200

400

600

0

400

800

1200

Jan Feb Mar Apr May Jan Feb Mar Apr May Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Figure 14: Number of Issue Tweets by Parties (England, Positive)

economy education EU

immigration tax welfare

0

100

200

300

−50

0

50

100

0

40

80

120

0

50

100

150

0

100

200

0

100

200

300

400

Jan Feb Mar Apr May Jan Feb Mar Apr May Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 34: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

34

The analysis of issue tweets in Scotland suggests that the SNP electoral success was

not associated with the set of issues that dominated the electoral conversation in the rest

of the UK. The SNP had a large volume advantage with respect to economic issues. But

with respect to the other issues in our corpus, the SNP were not favored. Clearly, the

issue of Scottish independence played a large role in the SNP success.

Figure 15: Number of Issue Tweets by Parties (Scotland)

economy education EU

immigration tax welfare

0

100

200

0

20

40

60

0

10

20

30

−10

0

10

20

30

40

0

20

40

60

80

0

40

80

120

Jan Feb Mar Apr May Jan Feb Mar Apr May Jan Feb Mar Apr May

Conservatives LibDem Labour SNP UKIP Greens

Page 35: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

35

7 Insights from Public Opinion Polling

One of the real attraction of monitoring public opinion using Twitter traffic is that the

measure is entirely non-evasive. It avoids many of the problems associated with survey

interviewing effects. On the other hand, Twitter users are unlikely to be a representative

sample of, in this case, the UK eligible voting population. Nevertheless, in spite of its

lack of representativeness, Twitter conversations could in fact do a good job of reflecting

public sentiment about politics. This section provides some speculative assessments of

the ability of Twitter sentiment to measure accurately the public sentiment about the

UK 2015 election. In order to do this we rely on the campaign survey data collected by

the 2015 British Election Survey (BES).

The 2015 BES Wave 5 conducted a daily online election survey beginning thirty-eight

days before election day. A number of the questions measure sentiment that is similar

to the Twitter sentiment. This allows us to compare our daily Twitter sentiment series

with a number of the BES time series. The BES asked an 11-point thermometer scale

question regarding each of the party leaders (it ranged in value from 0 to 10).9 Figure 16

compares the BES results for Cameron and Miliband with those from the Twitter analysis

presented earlier. The BES series suggests in fact that public evaluations of Cameron

are higher than those of Miliband while the Twitter sentiment series suggest exactly the

opposite. An explanation that seems to be confirmed by Figure 17 is that Twitter users

have a Labour, or at least Miliband, bias. The BES surveys included a question asking

respondents if they were Twitter users. Approximately 25 percent of BES respondents

indicated they were Twitter users. Figure 17 presents responses to the leader’s evaluation

question for those identifying themselves as Twitter users. Miliband evaluations amongst

Twitter users is significantly higher than those of Cameron’s, suggesting that the Twitter

traffic result reported in Figure 8 that favoured Miliband may be related to the partisan

9The actual question wording is “How much do you like or dislike each of the following party leaders?”The leaders’ names are presented in a randomized order.

Page 36: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

36

composition of Twitter users.

Figure 16: Leader Evaluations in the 2015 British Election Survey

4.5

4.7

4.9

5.1

5.3

Mar 30 Apr 06 Apr 13 Apr 20 Apr 27 May 04

Cameron Miliband

Figure 17: Leader Evaluations in the 2015 British Election Survey (Twitter Users)

4.0

4.5

5.0

5.5

6.0

Mar 30 Apr 06 Apr 13 Apr 20 Apr 27 May 04

Cameron Miliband

Page 37: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

37

The BES also asked respondents to evaluate the major UK political parties using

the same 11-point scale employed for leader evaluations. Figure 18 presents the mean

Conservative and Labour results for the total sample while Figure 19 presents the same

evaluations for Twitter users in the BES sample. First, the Labour Party clearly leads

the Conservatives amongst both the total and Twitter samples from the BES. Second,

our Twitter sentiment scores for the Labour and Conservative parties show the same

pattern with Labour clearly having an advantage over the Conservative Party.

Figure 18: Party Evaluations in the 2015 British Election Survey

5.0

5.5

Mar 30 Apr 06 Apr 13 Apr 20 Apr 27 May 04

Conservatives Labour

Page 38: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

38

Figure 19: Party Evaluations in the 2015 British Election Survey (Twitter Users)

4

5

6

Mar 30 Apr 06 Apr 13 Apr 20 Apr 27 May 04

Conservatives Labour

Page 39: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

39

7.1 Issues

The BES included an open-ended question on the most important issue facing the

country. The exact questionnaire is ”As far as you’re concerned, what is the SINGLE

MOST important issue facing the country at the present time?” In order to compare

responses to this question to our Twitter-derived topics, we categorize answers to this

question using the same dictionaries used for the twitter topic detection. Column 1 of

Appendix Table 5 provides a list of the resulting topics.

To improve the performance of detection, we manually improved the dictionary by

looking at the high frequency terms in the BES list verbatim answers for those situations

in which our original dictionary failed to detect issues. Table 8 shows the modified

dictionary. In addition we added two more topics: Environment and Crime and Security.

These two topics are not particularly large, but are clearly defined and distinguishable

from the other six topics. With this improved dictionary, more than a half of 9,852 texts

unmatched with the original dictionary are categorized in one or more issue topics.

Table 5: Detected Issues in BES

Number of MatchingTopic Name Original List Improved List

Economy 8257 9246Immigration 5454 5975Tax 131 131Welfare 3770 6319EU 248 547Education 303 303Environment 476Crime and Security 797Unmatched 9852 4488

There are noticeable differences between the volume of tweets and the frequency of

BES perceived important issues (Figure 20). In particular, there are three issues that are

frequently mentioned in the Twitter traffic but are rarely mentioned as most important

Page 40: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

40

issues in the BES: the EU, education and taxes. The most interesting difference is the

mention of taxes. Less than one percent of BES respondents think that taxes are the

most important issue; but we find that the tax topic represents about eighteen percent

of Twitter topics.

Figure 20: Issues on Twitter and in the British Election Study 2015 Compared

BES (All)

BES (non−TW User)

BES (TW User)

Twitter

0.00 0.25 0.50 0.75 1.00

Issue

economy

immigration

tax

welfare

EU

education

Figure 20 clearly suggests the importance of the economy in Twitter traffic. And the

economic issue importance is also confirmed by the BES most important issue question.

Moreover we saw earlier in Figure 13 and Figure 14 that the Conservative Party had a

significant lead in terms of tweets that associated the Conservative Party to the economy

in a favourable light. This would suggest that the economy likely played an important

role in the Conservative Party victory.

In order to explore further how issue perceptions may have affected vote choice, we

return to the BES survey data. We run a conditional logit model that includes the six

issues as explanatory variables (Table 6). The baseline is the vote for the Conservatives.

Note that the economy is the only issue on which the Conservatives have an advantage

against all other parties. The respondents who think the economy is the most important

issue in the election comprise the largest group compared to the other issues, and are

Page 41: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

41

Table 6: Conditional Logit Model of Vote Choice

Labour LibDem UKIP Greens

Economy −0.68∗∗∗ −0.39∗∗∗ −1.08∗∗∗ −1.01∗∗∗

(0.05) (0.07) (0.08) (0.08)Immigration −1.07∗∗∗ −1.35∗∗∗ 1.07∗∗∗ −2.31∗∗∗

(0.06) (0.10) (0.06) (0.16)Tax 0.23 0.61 0.63∗ 0.73∗

(0.29) (0.38) (0.37) (0.38)Welfare 1.30∗∗∗ 0.76∗∗∗ 0.05 0.96∗∗∗

(0.06) (0.08) (0.09) (0.08)EU −1.01∗∗∗ −0.21 1.69∗∗∗ −0.98∗∗∗

(0.18) (0.22) (0.12) (0.32)Education 0.63∗∗∗ 0.88∗∗∗ −0.89∗∗ 0.76∗∗∗

(0.18) (0.24) (0.41) (0.26)Constant 0.09∗∗∗ −1.27∗∗∗ −1.11∗∗∗ −1.35∗∗∗

(0.03) (0.05) (0.05) (0.05)

Log Likelihood −24185.69Num. obs. 19235∗∗∗p < 0.01, ∗∗p < 0.05, ∗p < 0.1

more likely to vote for the Conservatives. Welfare is more or less the opposite: voters

who think welfare is the most important issue are less likely to vote for the Conservatives

and they are more likely to support Labour. This results does not change even if we limit

the analysis to the Twitter users among BES respondents (Table 7). Although the issue

interests of Twitter users are slightly different from the non-twitter users, their vote choice

is shaped by similar issue priorities. Hence, these multivariate results seem to confirm

that the economic concerns of the voters contributed significantly to the Conservative

victory.

Page 42: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

42

Table 7: Conditional Logit Model of Vote Choice (Twitter Users)

Labour LibDem UKIP Greens

Economy −0.60∗∗∗ −0.54∗∗∗ −1.05∗∗∗ −1.02∗∗∗

(0.08) (0.13) (0.17) (0.13)Immigration −0.94∗∗∗ −1.66∗∗∗ 1.37∗∗∗ −2.56∗∗∗

(0.12) (0.25) (0.14) (0.33)Tax −0.05 0.03 0.50 0.36

(0.47) (0.68) (0.69) (0.58)Welfare 1.45∗∗∗ 0.74∗∗∗ 0.15 1.11∗∗∗

(0.11) (0.15) (0.20) (0.14)EU −0.91∗∗ −0.10 1.84∗∗∗ −0.64

(0.36) (0.44) (0.29) (0.50)Education 0.41 0.73∗∗ −1.67 0.23

(0.27) (0.34) (1.03) (0.37)Constant 0.35∗∗∗ −0.99∗∗∗ −1.37∗∗∗ −0.77∗∗∗

(0.06) (0.10) (0.11) (0.09)

Log Likelihood −6875.71Num. obs. 5435∗∗∗p < 0.01, ∗∗p < 0.05, ∗p < 0.1

Page 43: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

43

8 Conclusion

In this essay we explore the utility of using Twitter conversations to explain election

outcomes. Our efforts are based on a large corpus of Tweets collected during the six-

month period approaching the 2015 UK elections. Our analysis includes the geo-location

of tweets, sentiment analysis, and issue/topic modelling. The analysis focuses on England

and Scotland. The interpretation of the Twitter landscape of Scotland is straightforward:

the Scottish National Party dominated the Twitter conversation on all aspects including

the number of tweets, general sentiment, and the evaluation of key issues. In contrast,

the results of England are more nuanced: Generally speaking, the Labour Party was the

slight favorite; however, the Conservative Party had a slight edge over Labour on certain

issues that may have been critical in the ultimate Tory victory.

Twitter volume and sentiment tended to favour both the Labour Party and its leader,

Mr. Miliband. This may reflect a Labour partisan bias in the population of Twitter

users. Polling data from BES suggested a Cameron advantage on leader evaluations.

Both BES and Twitter suggest that the electorate has a more positive evaluation of

the Labour Party over the Conservative Party. Our analysis of Twitter issue sentiment

suggests concerns about the economy played to the Conservative advantage – this would

seem to be confirmed also by our analysis of the 2015 BES survey data.

We also discussed a strategy for correcting the biases in the twitter user population.

The idea is to obtain additional information regarding the user demographics such as age

and occupation and use it to weigh the volume of classified tweets that express opinionated

mentions of parties and leaders. The assessments of Twitter users in the UK will not only

provide a useful tool to improve our explanations of UK Election through Tweets, but

also help researchers interested in utilizing Tweets in the UK for understanding political

or other types of issues.

Page 44: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

44

References

Boutet, Antoine, Hyoungshick Kim and Eiko Yoneki. 2013. “What’s in Twitter, I knowwhat parties are popular and who you are supporting now!” Social Network Analysisand Mining 3(4):1379–1391.

Burger, John D, John Henderson, George Kim and Guido Zarrella. 2011. DiscriminatingGender on Twitter. In Proceedings of the 2011 Conference on Empirical Methods inNatural Language Processing. pp. 1301–1309.

Burnap, Pete, Rachel Gibson, Luke Sloan, Rosalynd Southern and Matthew Williams.2015. “140 Characters to Victory ?: Using Twitter to Predict the UK 2015 GeneralElection.” Electoral Studies .

Caldarelli, Guido, Alessandro Chessa, Fabio Pammolli, Gabriele Pompa, MichelangeloPuliga, Massimo Riccaboni and Gianni Riotta. 2014. “A Multi-Level GeographicalStudy of Italian Political Elections from Twitter Data.” PLoS ONE 9(5):e95809.

Ceron, Andrea, Luigi Curini and Stefano M. Iacus. 2014. “Using Sentiment Analysis toMonitor Electoral Campaigns: Method Matters–Evidence From the United States andItaly.” Social Science Computer Review .

Ceron, Andrea, Luigi Curini, Stefano M. Iacus and Giuseppe Porro. 2013. “Every tweetcounts? How sentiment analysis of social media can improve our knowledge of citizens’political preferences with an application to Italy and France.” New Media & Society16(2):340–358.

Conover, M, B Goncalves, J Ratkiewicz, A Flammini and F Menczer. 2011. Predictingthe Political Alignment of Twitter Users. In Proceedings of 3rd IEEE Conference onSocial Computing (SocialCom).

Curini, Luigi, Andrea Ceron and Stefano M Iacus. 2013. To what extent sentimentanalysis of Twitter is able to forecast electoral results ? Evidence from France , Italyand the United States. In 7th ECPR General Conference. Number September 2013 in“Paper Presented at 7th ECPR General Conference” pp. 1–28.

Fisher, Stephen D. and Michael S. Lewis-Beck. 2015. “Forecasting the 2015 Britishgeneral election: The 1992 debacle all over again?” Electoral Studies .

Gayo-Avello, Daniel. 2013. “A Meta-Analysis of State-of-the-Art Electoral PredictionFrom Twitter Data.” Social Science Computer Review 31(6):649–679.

Hopkins, Daniel J and Gary King. 2010. “A Method of Automated Nonparametric Con-tent Analysis for Social Science.” American Journal of Political Science 54(1):229–247.

Hutto, C J and Eric Gilbert. 2014. VADER: A Parsimonious Rule-based Model forSentiment Analysis of Social Media Text. In Proceedings of the Eighth InternationalAAAI Conference on Weblogs and Social Media. pp. 216–225.

Page 45: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

45

Jungherr, A., P. Jurgens and H. Schoen. 2012. “Why the Pirate Party Won the GermanElection of 2009 or The Trouble With Predictions: A Response to Tumasjan, A.,Sprenger, T. O., Sander, P. G., & Welpe, I. M. ”Predicting Elections With Twitter:What 140 Characters Reveal About Political Sentiment”.” Social Science ComputerReview 30(2):229–234.

Lampos, Vasileios, Daniel Preotiuc-Pietro and Trevor Cohn. 2013. “A user-centric modelof voting intention from Social Media.” Proceedings of the 51st Annual Meeting of theAssociation for Computational Linguistics (ACL) pp. 993–1003.

Mohammady, Ehsan and Aron Culotta. 2014. Using County Demographics to InferAttributes of Twitter Users. In Proceedings of the Joint Workshop on Social Dynamicsand Personal Attributes in Social Media. pp. 7–16.

OConnor, Brendan, Ramnath Balasubramanyan, Bryan R. Routledge and Noah A.Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion TimeSeries. In Proceedings of the Fourth International AAAI Conference on Weblogs andSocial Media. pp. 112–129.

Pak, Alexander and Patrick Paroubek. 2010. “Twitter as a Corpus for Sentiment Analysisand Opinion Mining.” LREC 10:1320–1326.

Paltoglou, Georgios. 2014. Sentiment analysis in social media. In Online Collective Action.Springer pp. 3–17.

Paltoglou, Georgios and Mike Thelwall. 2012. “Twitter, MySpace, Digg.” ACM Trans-actions on Intelligent Systems and Technology 3(4):1–19.

Pang, Bo and Lillian Lee. 2008. “Opinion Mining and Sentiment Analysis.” Foundationsand Trends R© in Information Retrieval 2(1–2):1–135.

Preotiuc-Pietro, Daniel, Vasileios Lampos and Nikolaos Aletras. 2015. An Analysis of theUser Occupational Class Through Twitter Content. In Proceedings of the 53rd AnnualMeeting of the Association of Computational Linguistics and the 7th International JointConference on Natural Language Processing. Vol. 125 pp. 1754–1764.

Schwartz, H. Andrew, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski,Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell,Martin E.P. Seligman and Lyle H. Ungar. 2013. “Personality, Gender, and Age in theLanguage of Social Media: The Open-Vocabulary Approach.” Plos One 8.

Sloan, Luke, Jeffrey Morgan, Pete Burnap and Matthew Williams. 2015. “Who Tweets?Deriving the Demographic Characteristics of Age, Occupation and Social Class fromTwitter User Meta-Data.” PLoS ONE 10(3):e0115545.

Tumasjan, A., T. O. Sprenger, P. G. Sandner and I. M. Welpe. 2011. “Election ForecastsWith Twitter: How 140 Characters Reflect the Political Landscape.” Social ScienceComputer Review 29(4):402–418.

Page 46: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

46

Wilson, Theresa, Zornitsa Kozareva, Preslav Nakov, Sara Rosenthal, Veselin Stoyanovand Alan Ritter. 2013. SemEval-2013 task 2: Sentiment analysis in twitter. In Pro-ceedings of the International Workshop on Semantic Evaluation, SemEval. Vol. 13.

Page 47: Tweet as a Tool for Election Forecast: UK 2015 General ... · General Election as an Example DRAFT Philipp Burckhardt Raymond Duch Akitaka Matsuo January 3, 2016 ... uk independence

47

Appendix

Table 8: Topics Wordlist for British Election Study

Topic Name Search Terms

Economy cost of living, economy, income, business, austerity, budget,debt, borrowing, gdp, employment, job(s, ), wage(, s), livingcosts, economic growth, defecit, economic, ecconomy, financial,finance, rising price, living standard, standard.+living, money

Immigration imm.gration, im.gration, racist, immigrant, migrants,boarder(s, )

Tax tax, betroom, mansion, dodging, nondom, vat, ifsWelfare welfare, nhs, benefit, wowpetition, health, n.h.s, pension,

poverty, rich.+poor, elderly, wealth gap, equality, affordablehousing, hous.+(costs, prices), housing

EU proeu, no2eu, meps, brexit, mep, europe, EUEducation education, tuition, school, university, universities, apprentice-

ship, childcare, teachers, uniEnvironment environment, green, climate, global warmingCrime and Security security, crime, terrorism, isis, islam, islamism, defence