India’s #TwitterElection 2014 - Stanford Universitysnap.stanford.edu/class/cs224w-2014/projects2014/cs224w...user accounts (The nodes in our network) and up to 5000 fol-lowers for

India’s #TwitterElection 2014

Devin LuStanford University

[email protected]

Arpan ShahStanford University

[email protected]

Anunay KulshresthaStanford University

[email protected]

ABSTRACTEarlier this year, from April 7 to May 12, 540 million voterscast votes in the Indian general elections, the largest demo-cratic exercise in human history. It was hugely influential:the right-wing Bharatiya Janata Party (BJP) won the primeministership and a majority of seats in the Lok Sabha, thelower house of Indian congress. This ended 10 years of ruleby the left-wing Indian National Congress party. A newprime minister, Narendra Modi, swept into office with 31%of the vote (a strong plurality in the highly fractured Indianpolitical landscape). The Indian stock market jumped 5%on the news of his victory.The 2014 elections were also thefirst national elections in India that saw heavy use of socialmedia, particularly Twitter.

We performed an analysis of the Twitter network in In-dia in the months leading up to and including the election.Our subject of analysis are the politicians and the politi-cal parties they belong to. One of the major talking pointsof these elections and their results have been the effectiveuse of social media by the winning party and the winningprime minister Narendra Modi. Some commentators haveeven gone as far as calling it a #TwitterElection. Thisproject aims to evaluate these claims to draw a compara-tive landscape of the effectiveness of the use of social mediaby members of different political affiliations in the publiclandscape in India. By exploring network parameters andvarious data analysis techniques on Twitter Data specific tothe Indian elections, we want to evaluate the social mediastrategies between members of the different major politicalformations in India for the elections. The methodology usedto do this involves looking closely at both the direct roleof the politician/formation as well as their influence and anapproximation of their catalyzing role in the network. Toquantitatively be able to compare these, we looked at net-work parameters such as the Contagion modeling of infor-mation via retweets, Network parameters such as clustering,diameter, pageRank, analysis of strength of followers, senti-ment analysis based clustering of the network

1. DATAThe dataset has been obtained by a combination of using

Twitter’s public API and a dataset specific to Indian Elec-tions obtained by the help of Jure Leskovec and Rok Sosic.

The data obtained represents a substantial portion of thetwitter network from India and contains 15.5 million Unique

user accounts (The nodes in our network) and up to 5000 fol-lowers for each of these accounts (the edges of our network).The number of political tweets in this entire network is 10.6million, with 18,000 of the tweets originating directly frompoliticians and political commentators in India. Each of thetweets in our dataset are tweets pertaining to the electionsand have been filtered on keywords. The entire set of key-words is listed in the Appendix. This dataset comprises asubstantial portion of the ≈ 33 million unique Twitter usersin India.

The data corpus consists of Users, Tweets and Sentiments.However some key concepts used throughout this paper withregards to the dataset need to be explained below. We builda directed graph representing the following relationship ofthe users as edges in the graph. We model the users as nodesand consider an edge to exist from node u to v if v follows u.

For the purpose of the analysis, we often separate usersinto Politicians and Non-Politicans. The set of Politiciansthat were active on twitter during the elections were identi-fied from a twitter source. The entire list of political twitterhandles analyzed are listed in the Appendix. All other nodesin the graph are said to be Non Politicians for the pur-pose of the analysis. In addition, each tweet has been sub-ject to a sentiment analysis by derived from the words thetweet includes. The sentiment is expressed as a numericalvalue ranging from −1 to 1 indicating the degree of positiv-ity/negativity towards the parties addressed or discussed inthat tweet.

One of the key tools for analysis of the strength of the net-work are the tweets themselves. In particular there are someimportant pieces of information in addition to sentiment andkeywords captured by the tweets. We use the Tweet-time todo a time-series analysis of the graph. In addition to this,tweets that are retweets of others have the useful field sig-nifying which user this tweet retweeted.

Finally since the elections in India had defined politicalleanings embodied by the 4 primary formations: NDA, UPA,AAP, OTHER(3rd Front/4th Front). We have split the po-litical handles on twitter into their their associations withone of the above buckets.This split has also been listed in theappendix. We focus most of the attention of our analysis onNDA, UPA and AAP as they represent the three primarilydiscussed contenders in the media leading to the elections.

2. PRELIMINARY ANALYSIS OF GRAPHPROPERTIES

We initially computed the fraction of nodes contained inthe largest weakly connected component (Wcc) of the graph.Unsurprisingly, we found that 99% of all nodes were a partof the largest Wcc. This means that almost everyone whodiscussed political ideas via twitter is connected to a politi-cal leader.

Network Statistics

Nodes 15,181,851Edges 39,548,702Tweets 10,595,729Politicians Tweets 17,815Largest WCC 0.99

Interestingly, the sentiment of the tweets seems to reflectthe outcome of the elections. The NDA had far less nega-tively classified tweets than the UPA overall. However, thetime distribution of these negative tweets was not uniform:the BJP saw a relative burst of negative tweeting in Marchbefore settling into considerably more positive and neutraltweeting in the following months. Combined with the factthat April and May were the highest tweeting months of thecampaign may have played a factor in lending them momen-tum in the final weeks of the campaign.

We then explored the average clustering coefficients of thepoliticians as categorized by their affiliation and found in-teresting results here as well (specifically, the graphs underconsideration here consist of the subgraph of a politicianand all his/her followers). While the social network of theAAP is quite strong, from a contagion/influence perspec-tive, between the two primary formations: NDA and UPA,the UPA’s network is not as well connected. This could im-ply that contagions/ideas may find difficulty in spreadingthrough the followers of the UPA politicians since the im-mediate network around the politicians does not reinforceeach other as strongly.

Eccentricity Centrality (depicted in the graph below) isdefined as the largest shortest-path distance from a givennode to any other node in the graph. We found that the av-erage value of eccentricity centrality for UPA is lower thanthat of NDA and AAP. This is surprising because this meansit was easier for the UPA to reach the masses yet it faredpoorly in the polls.

PageRank is another metric that produced some interest-ing results. As OTH includes political analysts as well asindependent candidates, the high average PageRank for thisgroup is expected. This is explained by the larger influenceof its members. But we find that the average PageRank forthe UPA is higher than that of the NDA and AAP. Whilethis means that members of the UPA have a larger influence,they may have failed to capitalize on it.

3. OVERVIEW OF PAST WORKThere are two basic models for contagion to consider: Lin-

ear Threshold and Independent Cascade1. In the LinearThreshold model, nodes are activated based on whether thesum of edge weights to activated neighbors exceeds a ran-domly drawn activation parameter θv. Both edge weightsand activation parameter are drawn randomly i.i.d fromsome underlying distribution.

In the Independent Cascade model, nodes again have aparticular weight. However, the method of propagation isdifferent: active neighbors cause probability of flipping ac-cording to the edge weight, where edge weights are againdrawn i.i.d from some underlying distribution.

We adapt some of these ideas in formulating our conta-gion model. Since it is likely that people will tweet multipletimes in support or opposition to a particular political party,We present an augmented contagion model that accounts forrepeated stimuli from the same source. This is further ex-plained in section 4.

Direct analysis on the applicability of Twitter to electionshas also been done, most notably by McKelvey, DiGrazia,and Rojas2. They perform a sentiment analysis of Tweetsin a given district to estimate the Democratic or Republicanvoter share and find that that is a good predictor or a party’selectoral success. Our paper builds explores additional met-rics and network attributes that may drive further researchin this area.

4. MATHEMATICAL MODELSFor any node v ∈ G, let OutDeg(v) denote the out-

degree of v. Similarly, let InDeg(v) denote the in-degreeof v. We also define OutNodes(v) = {u|(v, u) ∈ E} andInNodes(v) = {u|(u, v) ∈ E}.

In this paper, we introduce an augmented contagion anal-ysis model which accounts for the impact of repeated stimulifrom adjacent nodes. This is different from the usual con-tagion model in that it considers the number of times aninfected node tries to spread the infection and thus, can beeffectively used to model the flow of information.

We define f : V × V × N→ R as

f(v, u, i) =

{ 1InDeg(u)i

: v ∈ InNodes(u)

0 : v 6∈ InNodes(u)(1)

This function f models the impact that the i-th stimuli(due to v) has on u. Applied to the problem at hand, thisrepresents the impact on u by v’s i-th tweet about a partic-ular keyword k.

Thus, for a fixed keyword k, the total impact on u by itsadjacent nodes is

∑v∈InNodes(u)

nv∑i=1

f(v, u, i) (2)

where nv represents the number of times v tweets aboutkeyword k. Note that only adjacent nodes can impact u inthis contagion model.

(2) simplifies to

∑v∈InNodes(u)

nv∑i=1

1

InDeg(u)i(3)

=∑

v∈InNodes(u)

1

InDeg(u)+

1

InDeg(u)2+ ...+

1

InDeg(u)nv

(4)

=∑

v∈InNodes(u)

InDeg(u)nv − 1

InDeg(u)nv (InDeg(u)− 1)(5)

(To see the derivation of this expression, see the appendix).If we consider only tweets by the nodes in InNodes(u)

about keyword k before node u’s first tweet about k, we cansomewhat gauge the ’threshold’ of node u for tweeting aboutk. Informally, it is a measure of how many tweets (aboutk) u needs to see before u itself starts tweeting about k.This is the motivation behind using the augmented conta-gion model.

5. ALGORITHMSDue to the large scale of our dataset (≈ 7GB of Tweet

data alone), we were forced to develop highly optimized al-gorithms for analyzing the data and generating results.

We made sure that most algorithms we employed had anamortized runtime of O(n). We further used multiprocess-ing on a 16-core Amazon AWS EC2 instance to run thesealgorithms and compute the presented metrics.

Furthermore, we memoized most data - including the so-cial graph and the total number of tweets by a given useretc.

Perhaps one of the most important algorithm we devisedis the one used to compute the impact on every node usingthe augmented contagion model described above.

Data: An array of users (nodes) who tweeted aboutkeyword k in chronological order

initialization;numTweets = {};resultTemp = {};result = {};count = {};for node in array do

if node in count thencount[node] +=1;

elsecount[node]= 1;

endfor follower in OutNodes(node) do

if follower in resultTemp thenresultTemp[follower] += (1.0 / (inDeg **count[node]));

elseresultTemp[follower] = (1.0 / (inDeg **count[node]));

end

endif node in resultTemp then

result[node] = resultTemp[node];else

result[node] = 0end

endAlgorithm 1: Augmented Contagion Model

Another set of analyses involved understanding sentimentwithin tweets and classifying parameters and trends basedon tweets with specific keywords. Two specific objectiveswere outlined:1) Measuring Subgraph parameters after classification basedon sentiment for specific keywords linking tweets to the ma-jor political formations2) Visualizing the graph based on these keywordsThe results are outlined later in the paper.

To enable this analysis in an efficient manner, due to thelarge size of the data, the tweets were all read just once af-ter which readable maps and lists were maintained on diskwhich summarized the data by classifying the tweets basedon keyword[see appendix for set of keywords].

After processing the keyword→ Tweets Mappings, To ob-tain a sentiment level understanding as well, another mapwas created on disk which consisted a map of nested key-word → Tweet Mappings, mapping from Positive/NegativeSentiment.

To gather insight from the data in the context of theseelections, the keywords were further separated based on thepolitical affiliations to do be able to visualize sentiment acrossthe network towards the affiliations. In addition to the origi-nal calculations, The visualization was a non-trivial problemdue to the large number of nodes with sentiment informa-tion. Filtering was done to obtain colorings based on userswith Uniquely positive sentiment tweets towards a partic-ular affiliation to be able to discern concrete support. Theabove limits the nodes, and edges exist if an edge exist in

the real twitter network . The final visualizations involvedconverting to JSON format and using the scalable javascriptrendering library: D3

6. ANALYSIS OF FOLLOWER STRENGTHTo disseminate information and be persuasive through so-

cial media, effective strategies rely on the power of the net-work and other individuals to be active propagators of ideas.The concept of virality stems from this notion. To under-stand how active and effective politicians were at buildinga strong support based we look at the number of politicaltweets written by the followers of political formations as wellas the number of retweets for these followers:

From the graphs we start to see real differences in thestrength of the supporters of the different coalitions. Thefollowers of NDA leaders are almost twice as active at polit-ical tweeting during the election compared to to any othercoalition, including all the others outisde of AAP and UPAcombined.

Since these followers are so active, it would lend a signif-icant edge to the NDA coalition in terms of bringing moreattention to their message and campaign.

The second graph is equally interesting in this scenario. Aretweet is a mechanism by which people within the Twitternetwork can voice their support or agreement to a particu-lar tweet. In the context of political tweets, a retweet is anespecially interesting metric since it not just signals agree-ment but is also a way to further propagate the messagethrough the network. This makes it a very good indicator ofthe strength of the follower. The followers of politicians whoare most highly retweeted are therefore most successful notjust at getting people to like what they say but also spreadtheir message.

We see that the NDA dominates this metric as well: itsretweet count is double that of both the UPA and the AAPand is even doing better than all the other fourth partiescombined.

This corroborates the idea that the NDA led by Modi wereable to build a strong network on social media.

7. ANALYSIS OF SENTIMENT BASED SUP-PORTER SUBGRAPHS

Throughout the election season, the NDA maintained abetter Twitter presence than both the UPA and the AAP.The following figures show the number of users who madea positive tweet about the NDA, UPA, and AAP as a func-tion of month, followed by the average degree of the positivetweeters in over the same time period. Both indicate gener-ally high engagment on Twitter with the NDA compared tothe UPA and AAP.

Both these trends are not surprising, given the conven-tional wisdom and analysis above that the NDA was par-ticularly strong in social media compared to the other twoparties.

However, we did make the following interesting observa-tion: During September to November of 2013, the clusteringcoefficient of the subgraph of Twitter users who made posi-tive tweets about the AAP was markedly higher than that ofthe corresponding subgraph for both the NDA and the UPA(note that this is distinct from the clustering coefficient con-sidered in section 2, which was formed of a subgraph of apolitician and his/her followers). This could be an indicatorof tighter and more concentrated clusters of interest, whichwould be a source of political strength that could be missedby opinion polls alone.

The AAP is a brand new party. It was founded only inNovember 2012 and the 2013 Delhi legislative assembly elec-tions was its first electoral test. It is a small third party com-peting against the two factions that have dominated Indianpolitics for decades. In September and October, the conven-tional wisdom was that the AAP had no chance. Opinionpolls generally supported this view3. In the end, however,conventional wisdom was wrong: The AAP made an unex-pectedly strong showing, winning 28 out of 70 seats, becom-ing the second largest party in the assembly and eventuallythe majority partner in the ruling coalition.

In the national elections the following year, the cluster-ing coefficient for the AAP supporter subgraph had fallen inline with that of the NDA and UPA. Indeed, in the nationalelections it did not perform particularly well.

These results are obviously preliminary and we will haveto wait for more elections to take place for further analy-sis. However, the results from this past election cycle sug-gest clustering coefficient may be a good leading indicatorof electoral performance, even when pundit opinion or pollsare not.

Figure 1: Visualization of a random subsampling of 10,000 politically active nodes. Orange nodes are sup-porters of the NDA, dark blue are supporters of the UPA, and light blue are supporters of the AAP. Edgesare derived from follower relationships.

The last analysis done was to understand whether thereis polarization reflected in the support for the political af-filiations contesting the elections. If nodes with a politicalleaning are clustered together, this would signify such behav-ior. Using the sentiment information, a rendering of nodeswith uniquely positive sentiment towards at least one of thekeywords associated with NDA, UPA or AAP leaning wasdone. This rendering has been shown in the graph above(Figure 1).

The results here are also interesting, while there are clus-ters of signficiant NDA support and clusters of UPA support.

For the most part, the nodes are intermixed with significantfollow relationships present between people with opposingpolitical viewpoints. It is also clear that there are sum sig-nificant hub nodes within the graph. It is noteworthy that incorroboration with other findings, these hubs are primarilyOrange(NDA) supporting.

8. ANALYSIS OF RETWEET HOPS

As discussed earlier, retweets are a useful way to gaugehow many people agree with the views of a particular po-

litical leader. A fair assumption, in this case, is that thosewho retweet a political leader’s tweet are almost certainlyhis/her supporters.

To measure a leader’s clout, apart from looking at thenumber of retweets, it may also be useful to look at the av-erage hop distance between the political leader and peoplewho retweet his/her tweets.

The graph presented above was generated by calculatingthis average hop distance of top leaders from every partyand then aggregating the result.

It is evident that even here, the NDA was able to reacha wider audience as even distant nodes in the social graphretweeted NDA leaders’ tweets.

9. ANALYSIS OF SENTIMENT THRESHOLD

We use the augmented contagion model to estimate theaverage number of tweets a user has to see (i.e. tweets bythe people he/she follows) before he/she tweets somethingpositive or negative about a keyword.

Here the social media edge of the NDA is starkly illus-trated: firstly, the threshold for posting positive sentimentswas much lower for the NDA, at approximately 0.39 vs. 0.87for the UPA and 1.02 for the AAP. Secondly, the advan-tage in resisting negative contagion was even more dramatic:anti-NDA sentiments required an average augmented conta-gion expoure of 3.63 to spread, as opposed to 0.97 and 0.99for UPA and AAP, respectively.

Part of this discrepancy is likely due to the greater effortsof the NDA to saturate social media with their message, thusproviding some ”inoculation”against negative messages. Fur-thermore, the general population of NDA supporters may bemore tech-savvy than the supporters of the other parties,providing a natural bias when calculating these metrics overthe entire Twitter network.

10. CONCLUSIONS AND FURTHER WORKSUpon performing the analyses above, ranging from Net-

work parameters, Supporter Strength analysis, Sentimentanalysis, Retweet analysis, and the augmented contagionanalysis; Our results show that on almost every metric ofevaluation, the NDA outperformed the UPA, AAP as wellas other political parties. The strength of their network aswell as the rigor with which they have pursued their so-cial media strategy seems to have paid off as visible in thegeneral elections of 2014. They were able to convey theircampaign messages both more effectively as well as numer-ously and the nature of the followers they managed to geton board were also beneficial towards that objective. It isalso important to note that on a large number of these pa-rameters, the NDA not just outperformed the AAP and theUPA marginally but substantially so. The performance ofthe AAP and the UPA was almost the same in alot of themetrics where the NDA took significant leads.

In an age where social networks such as Twitter are a veryeffective tool for broadcast based communications, otherparties have not have succeeded at utilizing this resourcesto the extent that the NDA was able to in these elections.With 65% of the population under the age of 35 and grow-ing connectivity and use of the internet, Social Media andSocial Networks will continue to be relevant in political dis-course within India and across the world.

The goal of attempting to quantify an abstract notion asa successful Social Network Political Campaign is of coursequite a complicated task. Quantifying it however, enablesa more objective, holistic understanding of the phenomenathat guide successful communication in such Networks andNetwork Analysis techniques provide tools to enable this.

There is a lot of interesting further work that is possiblein the field of understanding effective use of social networksin a political or rhetorical context. Some such questions un-explored in this paper but within the scope of further workare:1) Attempting to understand the compositions of the mostsuccessful political Tweets in the context of Indian Elections2) Attempting to understand similarities and differences amongstsuch networks and elections across the world3) Understanding which events and time periods within anelection cycle are the most crucial to engage via social media

and social networks

11. CITATIONS1. Kempe, Kleinberg, and Tardos (2003). ”Maximizing

the Spread of Influence through a Social Network.”SIGKDD.

2. McKelvey, DiGrazia, and Rojas (2014). ”Twitter Publics:How Online Political Communities Signaled Electoral Out-comes in the 2010 US House Election.” Information, Com-munication and Society, 17:4, 436-450.

3. ”BJP may return to power in Rajasthan, Hung assem-bly in Delhi: India TV-CVoter projection.” India TV. 18September 2013.

12. APPENDIXBelow is a table of all political handles and their corre-

sponding political affliations. A handle not associated withthe NDA, UPA, or AAP is classisifed as OTH for ”other.”

Political Handle Affliliation

quizderek OTHKalrajMishra NDAM Lekhi NDAPawarSpeaks UPAprithvrj UPAPMOIndia OTHAITCofficial OTHmeerasanyal AAPSudheenKulkarni OTHjayantsinha NDAGulPanag AAPSwamy39 NDATarunvijay NDAKDSingh India OTHSushmaSwaraj NDABJP4India NDANCPspeaks UPAThinkwithGoogle OTHnarendramodi NDAINCIndia UPAasadowaisi OTHncbn NDASandeepDikshit UPA

yadavakhilesh UPAPandaJay OTHdilipkpandey AAPianuragthakur NDAsmritiirani NDAJKNC UPAshaziailmi AAPKapilSibal UPAShahnawazBJP NDAJP LOKSATTA OTHlaluprasadrjd UPAbuzzindelhi OTHPriyaDutt INC UPAAnanthKumar BJP NDABJPRajnathSingh NDAShainaNC NDAdigvijaya 28 UPAsupriya sule UPAashutosh83B AAPabdullah omar UPADrShahFaisal OTHDrKumarVishwas AAPManishTewari UPAAUThackeray NDAKirronKherBJP NDA

Political Handle Affliliation

manoharparrikar NDAmayankgandhi04 AAPPallamrajumm UPAmilinddeora UPAvarungandhi80 NDAVasundharaBJP NDArajeev mp OTHDrJitendraSingh NDAShashiTharoor UPAPrakashJavdekar NDAChennaiConnect OTHSalmanSoz UPAAamAadmiParty AAPSinghRPN UPAnsitharaman NDApriyankac19 UPAMPNaveenJindal UPANKSingh 41 NDAShuklaRajiv UPAChouhanShivraj NDAajaymaken UPAPiyushGoyal NDAharsimrat badal NDAjkpdp OTHArvindKejriwal AAPAapYogendra AAPmkstalin UPAarunjaitley NDAnaqvimukhtar NDAdrharshvardhan NDARamJethmalani5 OTHSushilModi NDA

Keywords captured within Tweet DataListed Below are the keywords that encompass all the tweetsin the data:

”Bharatiya Janata Party”, ”BJP”,”Bahujan Samaj Party”,”BSP”,”LK Advani”,”Mayawati”,”Narendra Modi”,”CongressParty”,”Rahul Gandhi”,”Manmohan Singh”,”Sonia Gandhi”,”MamataBanerjee”,”Trinamool Congress”,”P Chidambaram”,”RajnathSingh”,”Nitish Kumar”,”Janata Dal United”,”JDU”,”NaveenPatnaik”,”Sharad Pawar”,”Laloo Prasad”,”Rashtriya JanataDal”,”Mulayam Singh Yadav”,”Jayalalitha”,”United Progres-sive Alliance”,”UPA”,”Dravida Munnetra Kazhagam”,”DMK”,”NationalDemocractic Alliance”,”Shiv Sena”,”Rashtriya SwayamsevakSangh”,”Arvind Kejriwal”

13. APPENDIX

1

x+

1

x2+ ...+

1

xn=

n∑i=0

1

xi(6)

Recall the formula for an infinite geometric series

∞∑i=1

xi =x

1− x (7)

for x < 1. So∞∑i=1

1

bi=

1b

1− 1b

(8)

So we have∞∑i=1

1

bi=

1b

1− 1b

(9)

Thus,

n∑i=1

1

bi=

∞∑i=1

1

bi−

∞∑i=n+1

1

bi(10)

=

n∑i=1

1

bi=

∞∑i=1

1

bi− 1

bn

∞∑i=1

1

bi(11)

=1b

1− 1b

− 1

bn

1b

1− 1b

(12)

= (1− 1

bn)

1b

1− 1b

(13)

=bn − 1

bn

1b

b−1b

(14)

=bn − 1

bn(b− 1)(15)

Plugging this back into the original equation gives us thedesired result.

Documents

India’s #TwitterElection 2014 - Stanford Universitysnap.stanford.edu/class/cs224w-2014/projects2014/cs224w...user accounts (The nodes in our network) and up to 5000 fol-lowers for