Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
PREDICTING REVENUE WITH SOCIAL
MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK
FANPAGES”
Word count: 8913
Eline Debakker Student number : 01205061
Supervisor: Prof. dr. Dirk Van den Poel
Co-supervisor: Steven Hoornaert
Master’s Dissertation submitted to obtain the degree of:
Master of Science in Business Engineering
Academic year: 2016 - 2017
1 Confidentiality Agreement
I declare that the content of this Master’s Dissertation may not be consulted and/or reproduced.
Eline Debakker
I
2 Abstract
Abstract - Dutch
In deze paper wordt de activiteit op Facebook en Twitter fanpagina’s vergeleken met betrekking tot hun
relatie tot de omzet. Hiervoor werd de kwartaalomzet van verschillende Fortune 500 bedrijven gebruikt
tussen 2014 en 2016. Verschillende parameters van activiteit op Facebook en Twitter fanpagina’s worden
samen en apart vergeleken. De resultaten tonen dat er een verband is tussen kwartaalomzet en de hoeveelheid
tweets verstuurd door fans. Gebaseerd op de resultaten van deze studie, kan geen verband tussen Facebook
fanpagina activiteit en omzet aangetoond worden.
Abstract- English
This paper studies the relationship between Twitter and Facebook fanpage activity and quarterly revenue for
Fortune 500 companies between 2014 and 2016. A comparison is made to study these effects both separately
and together. Results indicate that shocks in the sentiment of fans on Twitter influence a brand’s quarterly
revenue. We found no significant evidence of a relationship between Facebook fanpage activity and quarterly
revenue.
Keywords: Fortune 500, sentiment, WoM, VAR-analysis, social network sites
II
3 Foreword
In this foreword, I would like thank a few people for helping me to write this thesis.
I wish to thank Prof. Dr. Dirk Van den Poel, for giving me the chance to write this master’s dissertation.
Mr. Steven Hoornaert, for his helpful advice and coaching in the last two years.
Finally, I would like to thank my mother, my sisters, family and friends for supporting me during my
studies.
III
Contents
1 Confidentiality Agreement I
2 Abstract II
3 Foreword III
1 Introduction 1
2 Literature 2
2.1 Online Brand Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Facebook, Twitter and their impact on business . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Data & Methodology 7
3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Operating revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.2 Social media data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Volume-related variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 The VAR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Results and Discussion 16
4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Facebook & Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 Tests for stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 Granger-Causality tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.4 Impulse response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 Tests for stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.2 Granger-Causality tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.4 Impulse Response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Facebook model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Tests for stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.2 Granger-Causality tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.4 Impulse Response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Limitations and future research 36
IV
List of Tables
1 Companies included in dataset per sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Variables and description of panel dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Example of Valence shifters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Overall descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Summary statistics (1/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Summary statistics (2/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Augmented ADF-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9 Granger causalities tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
10 LM-test for residual autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
11 Descriptive statistics of Twitter variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
12 Correlation analysis Twitter variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
13 Granger-causality tests for Twitter variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
14 Lag selection criteria for Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
15 LM-test for residual autocorrelation in the Twitter model . . . . . . . . . . . . . . . . . . . . 27
16 Descriptive statistics of Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
17 Correlation analysis of Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
18 Augmented ADF-tests for Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
19 Granger-causality tests for Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
20 Lag selection criteria for the Facebook model . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
21 LM-test for residual autocorrelation in the Facebook model . . . . . . . . . . . . . . . . . . . 31
List of Figures
1 Quarterly revenues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Estimation output of the VAR(2) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Impulse response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Impulse response values (Revenue and Posts) . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Impulse response values (Tweets and Facebook likes) . . . . . . . . . . . . . . . . . . . . . . . 25
6 Estimation output of the Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7 Impulse response functions of Twitter the model . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Impulse response values (Revenue and Twitter sentiment) . . . . . . . . . . . . . . . . . . . . 29
9 Estimation output of Facebook VAR(2) model . . . . . . . . . . . . . . . . . . . . . . . . . . 32
10 Impulse response functions of the Facebook model . . . . . . . . . . . . . . . . . . . . . . . . 33
11 Impulse response values (Posts and Facebook sentiment) . . . . . . . . . . . . . . . . . . . . . 34
12 Impulse response values (Likes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
V
1 Introduction
As customers are becoming more resistant towards traditional marketing programs, companies have turned
to social network sites to engage and communicate with their customers (Bagozzi and Dholakia, 2006). The
existence of social network sites (SNS) such as Facebook and Twitter have transformed customers from
passive receivers of content spread by a brand to active co-creators and broadcasters of brand messages.
Brand lovers got together on online brand communities, whether or not organized by a company. Online
brand communities are a form of consumer communities and are important for driving purchases. In this
aspect, content created by brand community members - UGC or user-generated content - is key. UGC is
more influential than content created by the brand. Social media activities influence the consumer buying
process (Saboo et al., 2016). Engagement in social media brand communities significantly increases consumer
purchases (Ping et al., 2012). Negative UGC caused by negative events such as a competitor’s product recall
also has an effect on a company’s sales (Borah and Tellis, 2015). Companies benefit from tracking and en-
couraging customer engagement behavior in communities, wherefore it not solely leads to more commenting
and liking, but also to more buying (Johanna Gummerus et al., 2012).
Brand communities grant companies access to a great number of consumers, at low costs, high speed and
ease of applicability. They are efficient in customer-to-customer based information exchange and learning
(Zaglia, 2013). Indeed, customers often add value by generating content and even become ardent advocates
for the seller’s products and can influence purchase decisions of others in peer-to-peer interactions (Sashi,
2012).
On Facebook, brand communities were created on the brand’s fanpage. On Twitter, brands typically re-
volve around news and message sharing. Consumers follow brands or brand advocates on Twitter for social
interactions, brand usage and information seeking reasons. These motivations are significant predictors for
consumer brand relationship variables such as brand community commitment (Kwon et al., 2014). Policies
on Twitter & Facebook state that only a person who is an authorized representative of a brand can create
and administer a brand’s fanpage. However, apart from official hosted fanpages by the company, there exists
an abundance of unofficial fanpages hosted by consumers (Facebook; Twitter).
Research has investigated the link between social media and real-life outcomes. In the last decade, a growing
body of research has used social media data for forecasting models, predicting box office revenues for movies
(Asur and Huberman, 2013), quarterly sales of Nike apparel (Boldt et al., 2016) and Apple’s iPhone (Lassen
et al., 2014). Most studies use information from one social platform, hereby ignoring the wider social ecol-
ogy of interaction and diffusion between multiple social network sites (Tufekci, 2014). Indeed, online social
media is part of an ecosystem where traditional and social media work together to engage customers and
promote products or services (Hanna et al., 2011). To our knowledge, only three studies used social big data
from more than one platform. Saboo used social media metrics from Myspace, Facebook, Twitter, Last.FM
and Youtube (Saboo et al., 2016), Boldt combined data from Facebook pages with online search volume via
Google Trends and Google Shopping (Boldt et al., 2016), while Oh studied the impact of volume-related
metrics of Twitter, Facebook and YouTube on revenue was studied (Oh et al., 2017).
1
Despite the acknowledgement of social media as a medium to drive sales and a growing body of research
investigating the matter, there is little understanding of how behavior of consumers influence sales across
social media platforms. After all, by using only one single social media platform, the wider social ecology
of interaction and diffusion is overlooked (Tufekci, 2014). This study investigates which social variables are
linked to revenue across multiple platforms, more specifically Facebook and Twitter. A comparison of the
impact of metrics on revenue is made. As 86% of Fortune 500 is present on Twitter and 84% on Facebook
(Statista, 2017), we will focus our study on the fanpages of these companies. As far as we know, only Apple
(#3 in 2016) and Nike (#91 in 2016) have been studied before in this aspect. Research, using data from
multiple companies, has not been done yet. A general insight into the parameters that influence revenue
on which platform, will make it easier for companies to monitor their social media activity. In addition,
knowing which parameters influence revenue could help companies to encourage fans to take certain actions
on Facebook and Twitter. This could lead to an increase in revenue. In this way, both companies and fans
will benefit from companies’ efforts to leverage their activity on these social media platforms. This study
hopes to provide guidelines on how to leverage social media that is not specific for one company. In this
aspect, our research is added to existing literature as it compares the impact of social media on business
outcomes of multiple companies across multiple platforms.
The remainder of this paper is structured as follows. Next session will give a review of related litera-
ture and relevant social metrics used in our study. Then, we discuss the data and methodology to build
multiple models. In section 4, the results are described and conclusions are made. Finally, we mention
limitations and give suggestions for future research.
2 Literature
2.1 Online Brand Communities
A brand community is a specialized, non-geographically bound community, based on a structured set of
social relationships among admirers of a brand (Muniz and O’Guinn, 2001). The community exists of a
collective of people with a shared interest in a specific brand, creating a subculture around the brand with
its own values, myths, hierarchy, rituals and vocabulary (Cova and Pace, 2006). Compared to traditional
communities, it takes less time to bond with customers (de Valck et al., 2009), the costs are lower and the
efficiency level of engagement is higher (Kaplan and Haenlein, 2010; Albert et al., 2008). By exchanging
information, experiences, knowledge or simply love for the brand, members can influence other customers’
relationship with and attitude towards the brand (McAlexander et al., 2002).
2
Over the last few years, brand communities emerged on social media. On Facebook, people can become
”fan” of a brand on their fanpage. Characteristics of a fan are self-identification as a fan, emotional en-
gagement, cultural competence, auxiliary consumption and co-production (Kozinets, 2008). In this regard,
a fanpage, hosted by a company or a consumer, with its active members can be seen as a brand community.
Zaglia showed that fanpages indeed showed brand community characteristics; perceived membership, social
identity, higher moral responsibility and fulfillment of their need of information (Muniz and O’Guinn, 2001;
Zaglia, 2013).
A fanpage also serves as a platform to communicate concerns and suggestions and to receive social enhance-
ment (Zaglia, 2013). On Twitter, brand followers do not perceive the brand page as a traditional brand
community (Kim et al., 2014). Brand lovers on Twitter are little motivated to engage with other users
(Kwon et al., 2014)). The community is not bound by close ties between users. Unlike Facebook, Twitter
was not designed to support the development of communities. Nevertheless, brand fans would still share a
sense of community with the brand and other brand admirers within an ”imagined community” comprised
of sets of interlinked ”personal communities” (Tiryakian et al., 2011). Communities in a network on Twitter
can be defined as a group of nodes, representing Twitter users, more densely connected to each other than to
nodes outside the group. These communities are often topical or based on shared interest, such as a brand
(Java et al., 2007).
Marketeers recognized the importance of online communities as a way to enhance brand advocacy and to
understand the tastes and decision-making influences of consumer groups (Kozinets, 2008). For a company,
the added value of brand community members is greater than the one those of other customers because of
four dynamics (Joseph P. Cothrel, 2000). A buy-sell dynamic which provides members with information
they need to buy products. Referral dynamics indicate that community members, hyper affiliates, tend to
spread word about products and companies they love. A self-select dynamic makes it easier for sellers to send
relevant offers to targets, because members already signal their interest by joining the community. Lastly,
the repeat visit dynamic states that community members are repeat visitors, who will buy more and more
often, since the seller has more opportunities to reach them. Empirical evidence showed that membership
in a social media brand community has an impact on purchase behavior. Community members visit a store
more frequently than non-members. They also spend more, which shows that attitudinal loyalty to the
brand’s community translates to behavioral loyalty (Ping et al., 2012). More participation in generating
content and interaction is what makes a brand community sustainable and profitable, because the level of
participation influences consumer intention (Ding et al., 2017).
Customers often seek advice on their social networks. This advice is perceived as highly valuable and
is later used to make buying decisions or to learn (Zaglia, 2013; Wu et al., 2010). Bagozzi defined these
motivations as informational and instrumental benefits, distinguishing them from entertainment benefits
such as fun and relaxation (Bagozzi and Dholakia, 2006). Social benefits include recognition and friendship.
Economic benefits occur when members can receive discounts or participate in competitions (Gwinner et al.,
1998). The commitment of members towards the community is influenced by community interactions and
3
rewards for their activities, which in turn influences loyalty towards a brand (Jang et al., 2008).
2.2 Facebook, Twitter and their impact on business
Besides the impact of social media on people’s lives, it has also influenced corporations and their marketing
strategies. Companies realized the need to be active on social media and nowadays they spend an increasing
amount of time determining a social media strategy. A company’s presence on social media can affect brand
loyalty and credibility, which in turn influences customer engagement and sales (Edosomwan et al., 2011).
However, consumers are no longer just passive receivers in a company’s marketing program. Instead, they
have become active creators of content. Individuals looking for information about a brand’s product are
exposed to a huge amount of content that is beyond that companies control. Social media platforms provide
the tools necessary for individuals to spread their own opinions and experiences, thereby influencing others
in the network. A growing number of social media users consult their social community before making
purchase decisions. The consumer decision process is impacted by social referrals from friends and strangers
alike (Barnes, 2014). Companies tap into this dynamic by providing functions on their websites that facilitate
this social sharing. Using Facebook‘s Like-function or Twitters tweets, consumers are encouraged to share
commercial information with their social network. This is called social commerce, a type of e-commerce in
which user-ratings, online communities and social advertising are used to facilitate online shopping (Liang
et al., 2011). Companies can have more insight into the interests of their customers, using social media big
data. There is also an increasing amount of research investigating the relationship between social media
and real-world outcomes. Kalampokis reviewed 52 empirical studies regarding the use of social media for
different business outcomes to create a social media analysis framework (Evangelos Kalampokis et al., 2013).
Our study focuses on the link between Facebook and Twitter big data and revenue and sales. Prior research
already revealed the existence of the relationship between these platforms and business outcomes. We will
discuss these studies, focusing on the social media metrics that have already been linked to revenue.
While early researches used blogs and online review platforms such as those on Amazon.com, more
research nowadays uses data from social network sites, particularly Twitter. Using social network sites as
a way to capture consumer behavior can be justified by the existence of online brand communities, where
like-minded people with similar interests are gathered. Online communities allow users to express their
personal preferences, to share recommendations and to identify trusting members. Consumers are more
likely to believe recommendations from people they trust (Wu et al., 2010). This could be friends and
family, but also fellow community members. Walsh argues that social network sites such as Facebook are
increasingly driving traffic to retail sites, indicating that communities might often be consulted before the
purchase decision is made (Walsh, 2006). The activity of Facebook fanpage members or followers of a Twitter
page can be measured using to types of variables, volume-related variables and sentiment variables. While
the former refers to the volume of chatter and other social media activity, the latter focuses on what is
said (DeMarco, 2013). Both measurements are a way to describe Word-of-Mouth. WoM is the process of
conveying information from person to person (Jansen, 2009). It involves sharing opinions or reactions about
4
businesses or products. In the online form, this electronic WoM is considered as trusted information, spread
by both friends and strangers in the immediate social network (Barnes, 2014).
This study compares the link between revenue and social media metrics of Twitter and Facebook. The
effect of each of these variables on revenue and vice versa should be studied simultaneously (Duan et al.,
2008; Rui et al., 2013; Ding et al., 2017; Oh et al., 2017). Moreover, most of the companies studied in
this study are active on both Twitter and Facebook, which means the information can flow between the
two platforms (?). Tufecki and Wilson found that social media users alternate between Facebook, Twitter,
broadcast media, cell-phone conversations, texting, face-to-face and other methods to interact and share
information (Tufekci, 2014). Behavior of fans on one platform might thus influence outcomes on the other.
Hypothesis 1: Twitter social media metrics have an influence on Facebook social media metrics.
Hypothesis 2: Facebook social media metrics have an influence on Twitter social media metrics.
As social media metrics, Facebook and Twitter sentiment, the volume of fanpage posts, likes and tweets will
be studied.
The content of tweets, posts and comments is analyzed to find opinions, thoughts or feelings (i.e. sen-
timent) which are then given a score based on whether these feelings are positive or negative. Sentiment
variables reveal sentiment of customers captured in online reviews, product ratings, blogposts and tweets
(Gruhl et al., 2005; Mishne and Glance, 2006; Chevalier and Mayzlin, 2006; Dhar and Chang, 2009; Onishi
and Manchanda”; Chern et al., 2015; Lassen et al., 2014; Archak et al., 2011; Chern et al., 2015; Asur and
Huberman, 2013; Chen et al., 2015; Zhu and Zhang, 2010). Harnessing this sentiment and text mining could
significantly improve traditional prediction methods (Zhi-Ping Fan, 2017). Sentiment analytics analyzes
words in a text, making a distinction between positive (+1), negative (-1) and neutral (0) words. Based on
these polarized words, each sentence is given a total sentiment score, which can explain revenue performance.
A weakness of sentiment analysis of social media is the noisy and informal nature of social media, causing
predictors to perform poorly (Ding et al., 2017).
The effect of social media Word-of-Mouth (WoM) on revenue has been studied several times. A significant
effect of positive and negative tweets on box office revenue has been found, which suggests that it is impor-
tant for companies to monitor the sentiment of tweets (Rui et al., 2013). Consumers looking for product
information who come across more positive WoM, will evaluate the product more positive. This will influence
their buying decision. If this reasoning is correct, an increase in sentiment (WoM) will indeed lead to higher
revenues. We will test if this relationship exists for the Fortune 500 companies included in our dataset.
Specifically,
Hypothesis 3: Twitter sentiment has an influence on quarterly revenue.
Hypothesis 4: Facebook sentiment has an influence on quarterly revenue.
Hypothesis 5: Quarterly revenue influences Twitter sentiment.
Hypothesis 6: Quarterly revenue influences Facebook sentiment.
Volume-related data such as the count of daily mentions, number of MySpace friends, tweet rate, number
5
of tweets and retweets, Facebook posts and likes and other social media metrics have been linked to revenue
(Gruhl et al., 2005; Dhar and Chang, 2009; Lassen et al., 2014; Asur and Huberman, 2013; Saboo et al.,
2016; Boldt et al., 2016). There is is correlation between the amount of attention a topic receives and its
success in the future. Asur & Huberman proved that the volume of tweets can serve as a proxy for a user’s
attention towards a product. (Asur and Huberman, 2013). Boldt used different Facebook variables such as
the number of likes and users, assuming that activity on fanpages can serve as a proxy for collective opinions
and attention (Boldt et al., 2016). Drawing on the AIDA framework, this attention can lead to purchases
(Lassen et al., 2014). Although Facebook is currently the largest social media platform, few studies have
used its datasets as a resource for studying real-world outcomes. Compared to Twitter, privacy settings
on Facebook make a lot of profiles inaccessible to the public internet (more than 50% on Facebook versus
less than 10% on Twitter) (Tufekci, 2014). Our analysis also faced this issue. Brand community markers
are strongest on closed Facebook Groups that are often private, making its data inaccessible. Luckily for
our analysis, Facebook fanpages are public and show brand community characteristics, making it an ac-
ceptable substitute. In her paper, Tufekci advocates for more multi-platform analysis. Differences in the
structure of Twitter and Facebook data and of the population using these platforms make it interesting to
compare them in a forecasting model. Nevertheless, a recent study found that Facebook data can be used
for forecasting sales of Nike products (Boldt et al., 2016). Nearly 4 in 10 Facebook users report that their
behavior on Facebook such as liking, sharing or commenting on an item has already led to them buying it
(Barnes, 2014). Facebook fans spend nearly five times the amount on products of their favorite brand in
comparison with non-fans (Hollis, 2011). Oh studied the effects of Facebook likes, posts and comments on
revenue (Oh et al., 2017). Indeed, the Facebook like has a positive impact on revenue, and vice versa (Ding
et al., 2017). Similarly, the relationship between revenue and Twitter has been studied using the Twitter
variables. Research shows the volume of tweets has a positive effect on revenue, as well as the number of
followers(Rui et al., 2013; Oh et al., 2017). Online Word-of-Mouth on blogs influences revenue, but increases
in revenue are also positively influencing online WoM volume. The influence of this effect is short, which
can be explained by the fast, real-time nature of Twitter. Additionally, Duan proved that the sentiment of
WoM also influences the volume (Duan et al., 2008).
WoM influences revenue in two ways. On the one hand, the sentiment influences how a consumer evaluates a
product and ultimately also his purchase decision. On the other hand, the volume of WoM increases aware-
ness. Similar to the volume of likes, this will increase revenue, as shown by the social impact theory (Latan,
1981). The same theory states that the immediacy and tie strength between individuals will determine how
much an individual is influenced. In communities, where ties are strong, this could lead to a significant
increase in purchase decisions. Based on previous research we expect there is a significant relationship and
revenues for Fortune 500 companies as well.
Hypothesis 7: The volume of tweets influences quarterly revenue.
Hypothesis 8: The volume of volume-related metrics on Facebook have an impact on quarterly revenue.
Hypothesis 9: Quarterly revenue influences the volume of tweets.
Hypothesis 10: Quarterly revenue influences the volume-related metrics on Facebook.
6
3 Data & Methodology
3.1 Data
The empirical data utilized in this study was retrieved from online data sources, Facebook and Twitter. Quar-
terly operating revenue from Fortune 500 companies was retrieved from the online data source Knoema.com.
Social media data was retrieved from Facebook and Twitter. Posts and comments were retrieved from
fanpages to capture customer sentiment, as well as tweets and comments on Twitter. Fanpages created by
Fortune 500 companies and their fans were included.
3.1.1 Operating revenue
Quarterly operating revenues were collected on the online data search engine Knoema.com. For correctness,
the list of companies was compared with the official list on the Fortune 500 website (Fortune500, 2016).
The quarters are annual quarters, starting on January 1st 2014 and ending at the end of the third quarter
of 2016, September 30th. Operating revenue is revenue generated from a company’s day-to-day business
activity. It comprises the revenue generated from selling the company’s products or services. In contrast,
the term revenue refers to the proceeds from sales and other sources of income.
3.1.2 Social media data
Both Fortune 500 companies and their fans can host fanpages on Facebook. Social data, posts and comments
were extracted from user- and company-created fanpages. A Facebook fanpage is noted as (“@ + page
name”). Any user is allowed to create a page, expressing interest in a brand. Users are asked to make clear
that this page is not the official fanpage, by indicating so in the name, info or bio. Companies have the
right to file a trademark report about any account that violates these policies and have it removed, but since
it is so easy to create new fanpages, it is impossible to keep track of them all. Nowadays, as long as the
page does not speak in voice of the brand, these user-created fanpages are often allowed to continue their
activities. In order to capture the behavior of fans of the Fortune 500 companies as best as possible, we use
both company and fan-hosted fanpages.
Different pages generate different levels of interaction, indicated by the number of page likes of a fanpage.
For instance, “@fansofapple”, a fan-hosted fanpage with 868.838 page likes on May 14th 2017, states that
“Fans of Apple is an independent Facebook fan page and is not in ANY way, shape or form affiliated with
Apple, Inc.” Companies can also have fanpages apart from their official page. These pages are specifically
intended for community purposes, such as the “@RNspire”, a nurse appreciation page monitored by Cardinal
7
Figure 1: Quarterly revenues
Health, dedicated to informing, inspiring and recognizing the community.
On Twitter, tweets and replies were scraped off of Twitter pages created by fans and companies. Unlike
Facebook, there is no specific construct for a fanpage. A “fanpage”, created with the purpose of informing,
entertaining and conversing with fellow brand lovers, is simply a Twitter account with in its name or bio a
mention of it being a fanpage. Again, some companies had an abundance of Twitter fan pages. For each
company, the top three accounts with the largest number of followers were selected. For example, ”@CultOf-
Mac: a site that follows everything Apple. News, reviews and how-tos”, had 783.362 followers and was the
largest Apple-related fanpage on Twitter on May 14th 2017.
Keeping in mind later analysis, only English pages were used. Pages created with the purpose of spreading
coupons, supporting the brand’s sports team, as well as hate and charity pages were excluded. Company-
hosted fanpages, created for divisional, internal or specific product lines were also excluded, so that only
pages for people with a shared interest in the brand remained, in accordance with the definition of a brand
community. Fanpages were found for only 178 of the Fortune 500 companies. Those pages could be created
by a fan, the company itself or both. However, we were not able to extract data from all pages. We omitted
pages that did not have any posts or activity between 2014 and 2016, as well as pages that were not created as
a Facebook page (but as a Group or Person). Eventually, we were left with Facebook data for 33 companies.
In a similar way, we extracted Twitter data for 52 of the 74 companies for which we found at least one
Twitter fanpage. An overview of the number of Fortune 500 companies included in the dataset per industry
is provided in table 1
8
IndustryCompanies in dataset withFacebook fanpage(s)
Companies in dataset withTwitter fanpage(s)
Aerospace & Defense Boeing
Lockheed MartinNorthrop GrummanTextronUnited Technologies
Apparel Nike Nike
Business Services -
AramarkMasterCardVisaWestern Union
Energy NextEra Energy
Centerpoint EnergyChevronConoco PhillipsDevon EnergyExxon Mobil
Financial Services - Berkshire Hathaway
Food & DrugstoresKrogerRiteAid
CVS CaremarkSupervaluWalgreensWhole Foods Market
Food & TobaccoArcher Daniels MidlandDr Pepper Snapple Group
Coca-cola
Healthcare Cardinal Health Cardinal HealthHotels & Leisure Marriott International -
Household productsAvon ProductsKimberly-Clark
Procter & Gamble
Industrials CaterpillarJohn DeereWhirlpool
Media Walt DisneyDiscovery CommunicationsLive Nation EntertainmentWalt Disney
Motors Ford MotorsFord MotorsGeneral MotorsGoodyear Rubber&Tire
Retailing
CostcoHome DepotNetflixRoss Stores
Netflix
Technology
Alphabet GoogleAmazonAppleCisco SystemsOracleSalesforce
AlphabetAppleCisco SystemsIBMIntelMicrosoftNetappOracleSalesforce
TelecommunicationsAT&TComcastWindstream
AT&TComcastWindstream
Transportation
American AirlinesCSXJetBlue AirwaysNorfolk SouthernUnion Pacific
American AirlinesDelta AirlinesJetBlue Airways
Table 1: Companies included in dataset per sector
9
We selected a dataset of 14 companies for which we have found fanpages on both Facebook and Twitter:
Apple, Ford Motors, AT&T, Kroger, Nike, Walt Disney, Jetblue Airways, Cisco Systems, Oracle, Com-
cast, Windstream, Cardinal Health, Salesforce.com and Alphabet Google. The evolution of their quarterly
revenues is given in 1
Variable Description
Date Year & QuarterCompany Index of the companyRevenue Quarterly revenue per company (million dollars)
Posts Volume of Facebook postsTweets Volume of tweetsTwsent Rate of positive tweetsFbsent Rate of positive posts & commentsLikes Volume of likes
Table 2: Variables and description of panel dataset
3.2 Variables
An overview of the variables used in the study is given in table 2. The data is structured as an unbalanced
panel. The time variable is Date, ranging from 2014Q1 to 2016Q3, the cross-sectional variable is Company.
3.2.1 Sentiment Analysis
In order to determine the Twitter and Facebook sentiment, a sentiment analysis was performed using senti-
mentr package (Rinker, 2017). Tweets were used for Twitter. For Facebook, we used posts and comments.
Each message was preprocessed using the tm-package in following ways:
• Elimination of URLs and hashtags, mentions of other account names (”tagging someone”)
• Elimination of white space, and other special characters
• Elimination of numbers
• Transformation of content to lower case
Then, each message was given a polarity score. The sentiment function in sentimentr calculates text
polarity sentiment for each sentence in a text by combining standard dictionary look-up for polarized words
with weighting for valence shifters. The words in each post or tweet were compared with polarized words
in the sentiment dictionary. We used the Jockers dictionary in the lexicon-package. We tried to calculate
sentiment as best as possible, taking into account the limited length of tweets and Facebook posts. While
simple dictionary methods often perform well when determining the overall sentiment of a long text, they
would be less appropriate to determine the sentiment of one tweet, since it does not take into account a
lot of words that can shift or reverse the sentiment of a polarized word. These words are called ”valence
shifters” (i.e. negators, amplifiers, de-amplifiers and adversative conjunctions). In table 3, examples were
obtained from the @CultOfMac fanpage, marking their valence shifters. Table 3 shows that valence shifters
occur often in social media context and influence the word they are associated with (polarized word). For
10
Valence shifter Tweet example
NegatorI do not expect, nor desire,software making such edits for me. Not OK
Amplifier I really hope they don’t do that
De-amplifierHardly seems like news, or relevantto anything
Adversative ConjunctionI use Bluetooth. However, trying to finda headset with lightning connector.
Table 3: Example of Valence shifters
example in ’Trump tweets’, the co-occurrence of valence shifters with polarized words are 19% for negators,
18% for amplifiers, 4% for de-amplifiers and 4% for adversative conjunctions (Rinker, 2017). We preferred
the sentiment function over standard dictionary look-up methods, making sure these sentiment shifters are
accounted for.
To determine the polarity score, each sentence sj , in tweet or post ti = {s1, s2, ..., sn}, is broken into an
bag of words si, j = {w1, w2, ..., wn}. For example, w1,2,3 is the third word in the second sentence of the first
tweet. Except for commas and semicolons (pausewords cw are considered as a word within the sentence), all
punctuation is removed. The algorithm first compares each word wi,j,k to a dictionary of polarized words.
Each positive (w+i,j,k) or negative (w−
i,j,k) word is tagged by +1 or -1 respectively. It forms a polar cluster
(ci,j,l ⊆ si,j) together with words taken from around the polarized word pw, so a cluster can be represented
as:
ci,j,l = {pwi,j,k − nb, ..., pwi,j,k − na}
with nb and na being respectively the number of words to be considered before and after the polarized word.
The upper and lower bound of the polarized word cluster are constrained by the pause words. In this way,
the algorithm takes into account changes of thought. The lower bound is constrained to max{pwi,j,k −
nb, 1,max{cwi,j,k < pwi,j,k}}, the upper bound is min{pwi,j,k + na,wi,jnmin{cwi,j,k > pwi,j,k}} with
{wi, jn} the number of words in the sentence j. In the context cluster each word is labeled as either neutral,
negator, amplifier or as de-amplifier. Based on the polarity_dt argument in the sentiment function, each
polarized word is given a weight w, which is then modified by the valence shifters surrounding the word.
• Neutral words (w0i,j,k) affect the word count, but have no influence on the polarity score.
• Amplifiers (wai,j,k) increase the polarity of a polarized word by 1.8. We used the default weight z = 0.8.
• De-amplifiers (wdi,j,k) reduce the polarity of a polarized word. Amplifiers become de-amplifiers if the
context cluster has an odd number of negators (wni,j,k).
• Negators (wni,j,k) flip the sign of a polarized word and act on amplifiers and de-amplifiers. Keeping in
mind that two negative words equal positive, 3 negatives a negative, the number of negative words is
used to determine the negation, raising -1 to the power of the number of negators (wni,j,k) plus 2.
• Adversative conjunctions before a polarized word up-weight a cluster by
1 + z2 ∗ {|wadv.conjunction|, ..., wpi,j,k} with the weight z2 equal to the default 0.85. If the adversative
conjunction occurs after the polarized word, it down-weights the cluster 1+{wpi,j,k, ..., |wadv.conjunction|∗
11
−1} ∗ z2. The number of occurrences before and after the polarized words are respectively multiplied
by 1 or by -1. This value is multiplied by value z2 and then 1 is added.
Finally, the weighed context clusters ci,j,l are summed up to c′
i,j and this final value is then divided by
the square root of the word count to yield a polarity score per sentence
δ =c′
i,j√wi,j,n
with:
c′
i,j =∑
((wamp + wdeamp ∗ wpi,j,k ∗ (−1)2 + wneg)
wamp =∑
(wneg ∗ (z ∗ wi, j, ka))
wdeamp = max(wdeamp′ ,−1)
wdeamp′ =∑
(z(−wneg ∗ wai,j,k + wd
i,j,k))
wb = 1 + z2 ∗ wb′
wb′ =∑
(|wadv.conjunction|, ..., wi, j, kp, wi, j, kp, ..., |wadv.conjunction| ∗ −1)
wneg = mod2 ∗∑
(wni,j,k)
The polarity score per sentence is aggregated with other scores to generate a mean polarity score per
item (tweet or post). Per quarter, positive and negative items are counted and a normalized positivity ratio
is calculated
PN ratio =P
P +N
3.2.2 Volume-related variables
Every Facebook and Twitter page has a different level of activity. We calculated the level of activity per
quarter. For Facebook, we assume that the total number of posts, likes and shares are indicators of the
degree of interaction. The number of tweets on Twitter pages are calculated. Sometimes a message is sent
multiple times, to reach an audience across the world taking time differences in mind. We did not correct
for these kind of messages, since it is a way to increase interaction.
All the variables in our dataset are aggregated per quarter and per company, resulting in an unbalanced
panel dataset of 65 companies. There are 14 companies for which we have both Twitter and Facebook
variables. We will use the latter one for our mixed model estimation.
12
3.3 The VAR model
The variables included in our unbalanced panel data frame are assumed to influence each other. Not con-
sidering them simultaneously would make our estimations inconsistent. VAR models, Vector Autoregressive
Models, are used for multivariate time series. Indeed, they are often used when there is a need to analyze
certain aspects of the relationships between the investigated variables. In each equation, the dependent
variable is a linear function of its past values and those of all the other variables. This explains the term
autoregressive (Porter, 2009).
A VAR thus consists of K endogenous variables yt = (y1t, y2t, ..., yKt for k = 1, ..,K (Pfaf, 2008). The
VAR(p)-process is defined as:
yt = A1yt−1 + ...+Apyt− p+ ut,
with ut of a K-dimensional process with E(ut) = 0 and E(utuTt ) =
∑(u), the stochastic error terms, called
impulses in the language of VAR. Using our variables, a VAR(p)-process can be written as a VAR(1)-process:
revt = α1 + φ11revt−1 + φ12postst−1 + φ13tweetst−1 + φ14twsentt−1 + φ15likest−1 + urev,t
postst = α2 + φ21revt−1 + φ22postst−1 + φ23tweetst−1 + φ24twsentt−1 + φ25likest−1 + uposts,t
tweetst = α3 + φ31revt−1 + φ32postst−1 + φ33tweetst−1 + φ34twsentt−1 + φ35likest−1 + utweets,t
twsentt = α4 + φ41revt−1 + φ42postst−1 + φ43tweetst−1 + φ44twsentt−1 + φ45likest−1 + utwsent,t
likest = α2 + φ51revt−1 + φ52postst−1 + φ53tweetst−1 + φ54twsentt−1 + φ55likest−1 + ulikes,t
Each variable is a linear function of the lag 1 values for all variables in the set. A VAR(1)-process can also
by written as:
ξt = Aξt−1 + ut,
which can be written in matrix form as:
revt
postst
tweetst
twsentt
likest
=
α1
α2
α3
α4
α5
+
φ1,1 φ1,2 φ1,3 φ1,4 φ1,5
φ2,1 φ2,2 φ2,3 φ2,4 φ2,5
φ3,1 φ3,2 φ3,3 φ3,4 φ3,5
φ4,1 φ4,2 φ4,3 φ4,4 φ4,5
φ5,1 φ5,2 φ5,3 φ5,4 φ5,5
revt−1
postst−1
tweetst−1
twsentt−1
likest−1
+
u1,t
u2,t
u3,t
u4,t
u5,t
There are many benefits of using VAR. First, the method considers all variables as endogenous, which
makes VAR a simple method since one does not have to worry about which variables are endogenous and
which ones are exogenous. Secondly, estimations are simple; the Ordinary Least Squares method can be
applied for each equation separately (Porter, 2009). However, VAR modeling has some strict assumptions.
First, in practice it is often challenging to choose the appropriate lag length. Unless you have a large sample,
13
inclusion of more lags will consume a lot of degrees of freedom, with the problem of overfitting as a result (Yu,
2009). Secondly, the VAR model assumes that the variables are jointly stationary. This is important because
if a time series is non-stationary, the studied behavior is only valid for the time period under consideration.
Hence, it would not be possible to draw general conclusions for other periods. “A process is stationary if
its mean and variance are constant over time and the value of the covariance between the two time periods
depends only on the distance or gap or lag between the two time periods and not the actual time at which
the covariance is computed” (Porter, 2009). In most practical situations, weak stationarity suffices. The
properties of stochastic time series are:
Mean : E(Yt) = µ
V ariance : var(Yt) = E(Yt − µ)2 = σ2
Covariance : cov = E[(Yt − µ)(Yt+k − µ)]
To test for stationarity, we do a unit root test, known as the augmented Dickey-Fuller test. The tests
starts from the augmented unit root function
∆Yt = δYt−1 +
m∑i=1
αi∆Yt−i + µt,
where µt is a white noise term and the ∆Yt−i represents the lagged values of the dependent variable. The
ADF-tests verifies if the null hypothesis, H0 : δ = 0, with the maximal number of lags,p, determined by the
Schwarz Criterion, proposed by Schwert (Schwert, 1989):
pmax = 12 ∗ (T + 1
100)0.25
When building a VAR model, it is important to take causality into account. The idea behind causality is
that a cause can not come after an effect. The past and present may cause the future, but the future can not
cause the past. For causation to occur, variable xt needs to have some information about what value Yt+1
will take in the immediate future (Granger, 1980). If a variable xt affects an other variable yt, the former
helps improving predictions of the latter.
The Granger test for causality, tests whether Xt Granger-causes Yt or the other way around. If the
former is true, then the regression of Yt on other variables should improve if we add lagged values of Xt to
the estimation function. Consequently, xt is ‘Granger-casual‘ for yt (Granger, 1980). Granger-causality can
be tested using F-tests. Two regressions of the variable Yt are made, one with and one without lagged values
of Xt. The F-tests checks if the added x-values provide extra explanatory power to the regression:
F =(RSSR −RSSUR)/m
RSSUR/(n− k),
14
with an F distribution with m and (n − k) df. m represents the number of lagged terms, n the number of
observations in the sample and k is the number of parameters estimated in the regression with lagged terms
of Xt. If the computed F-statistic exceeds the critical value, the null hypothesis of no Granger-causality
is rejected if and only if lagged values of an explanatory variable have been retained in the regression.
The Granger test is highly sensitive to the presence of unit roots and the lag length of the regression.
Tested variables have to be stationary, the optimal number of lags can be determined by the Akaike or
Schwarz Information Criteria. The choice of measure affects the results of the causality tests. Based on this
information, we decide which variables in our model are endogenous and which are exogenous. We use this
information to build the VAR model.
The optimal VAR(p) model is selected based on different criteria. As we have already discussed, this
procedure is challenging as it determines whether or not our model will be overfitted. Therefore, we compare
multiple decision criteria in order to find the optimal maximum number of lags. First, based on the Akaike
Information Criterion, Schwarz and Hannan-Quinn Information Criterion and the Final Prediction Error,
we get suggestions for an optimal VAR model. Then, to evaluate the performance of these models, we
test whether they are able to remove all residual correlation. If the latter is present in a dynamic model,
estimates are biased and generally inconsistent. Conclusions drawn from above-mentioned a model would be
incorrect (Breusch, 1978). So, to correct this, we use the Breusch-Godfrey LM-test for serial autocorrelation.
Assuming the error terms follow the autoregressive AR(p) scheme:
µt = ρ1µt−1 + ρ2µt−2 + ...+ ρpµt−p + εt,
where εt is a white noise term. The null hypothesis is that H0 : ρ1 = ... = ρp = 0 and is rejected if the
critical chi-square value is exceeded. Then, at least one ρ in the error term is statistically different from
zero.
Finally, we are interested in the response of a variable to an impulse of another variable in the system.
Impulse-Response functions, IRFs, are computed for the optimal VAR model and allow us to study the
causality between variables in the system. These functions trace out the response of the dependent variables
in the VAR system due to shocks in the error terms for several periods in the future. It is clear that impulse
response functions assume all variables to be stationary. Visual impressions are often made of the response
of one variable to a unit shock in another variable. This gives us an idea of the dynamic interrelationships
of the features included in the VAR model. A shock in one variable has no effect on the other variables
if the former does not Granger-cause the set of remaining variables. When variables have different scales,
rather than unit shocks, innovations of one standard deviation are considered. Since we have no idea of the
order of effects used in our model, we will use Generalized Impulse Response Functions. In these GIRFs,
the response of a variable is constructed as an average of what might happen, given the present and past.
Future shocks are averaged out (Gary Koop, 1996). It is viewed as the centerpiece of this analysis, since it
facilitates interpretation of the individual coefficients in the estimated model and allows comparison.
15
N Overall mean Overall deviation Min. Median Max.rev 140 19517 15579.43 1227 14760 75872fbsent 140 0.8629 0.1231829 0.3333 0.9025 1.0000twsent 140 0.8445 0.1174342 0.4933 0.8578 1.0000posts 140 5.181 1.631202 0.000 5.179 7.887tweets 140 4.697 1.804298 2.197 4.069 9.780likes 140 7.406 2.452273 0.000 7.718 11.555
Table 4: Overall descriptive statistics
4 Results and Discussion
4.1 Descriptive Statistics
Considering previous literature, there is evidence of a bilateral relationship between Twitter sentiment and
revenue, through consumer evaluation and the purchase decision. The volume of WOM influences awareness,
also leading to more revenue. These interconnections were demonstrated for tweets (Rui, 2013; Oh, 2016)
and blog posts (Duan, 2008). There was empirical evidence that sales were influenced by Facebook likes and
the unique number of posts (Ding, 2016; Oh, 2016). Facebook likes are also related to the revenue (Ding,
2016). Forecasting research made use of these relationships, by predicting sales using Twitter sentiment and
volume (Asur & Huberman, 2005; Lassen 2014), Facebook likes, shares and posts (Boldt, 2016) and WOM of
blogs (Gruhl, 2005; Dhar & Chang, 2007, Onishi & Manchanda, 2012). Due to the nature of these variables,
high correlations were expected between volume of Facebook posts and shares, so the latter was left out of
our analysis. We used the natural logarithm of Facebook posts, tweets and likes. Descriptive statistics of
the variables used in the analysis are shown in tables 4, 5 and 6.
Revenue seems to correlate the highest with the volume of posts, with a Spearman correlation coefficient of
(0.42). There is also a positive relationship between revenue and the volume of tweets and likes, indicated by
Spearman correlations of (0.27) and (0.32) respectively. The sentiment variables have negative relationships
with revenue, with a correlation of (-0.19) for both the Twitter sentiment and revenue relationship, as
the Facebook sentiment and revenue. The relationship between Facebook posts and their sentiment is less
strong (-0.15) than the relationship between tweets and their sentiment (-0.17). Finally, the number of likes
is related with posts (0.36) and with tweets (0.31). An overview of the correlations is given in table 7
4.2 Facebook & Twitter model
4.2.1 Tests for stationarity
Using the augmented ADF-test, the variables were first tested for stationarity. On the 5%-significance level,
the critical value of the t-statistic is −3.00. Based on the t-statistics found in table 8, the null hypothesis
of non-stationarity can be rejected for the revenue variable. The revenue does not contain a unit root.
Facebook sentiment is non-stationary, as well as Twitter sentiment. With a t-statistic equal to (-3.966) for
posts and (-4.5759) for tweets, we can reject the null hypothesis of non-stationarity for posts and tweets.
Finally, the null hypothesis is not rejected for likes, which means this variable is non-stationary.
Since a VAR model assumes stationarity, we try to solve the non-stationarity of Facebook and Twitter
16
Industry N Mean Standard Dev. Min. Median Max.Alphabet Technology 5 rev 20842 1440.38 18675 21329 22451Google fbsent 0.8540 0.0370657 0.8125 0.8419 0.9000
twsent 0.8078 0.1500823 0.6000 0.8000 1.0000posts 5.506 0.8778123 4.779 5.030 6.613tweets 2.746 0.186321 2.565 2.639 2.996likes 6.556 0.733597 5.771 6.377 7.498
Apple Technology 10 rev 53712 12260.38 42123 50081 75872fbsent 0.7073 0.0300224 0.6651 0.7190 0.7486twsent 0.7357 0.04240067 0.6644 0.7362 0.8000posts 7.725 0.0883445 7.550 7.732 7.887tweets 7.590 1.966115 2.890 8.708 9.074likes 11.05 0.2463796 10.70 11.06 11.45
AT&T Telecom 10 rev 36872 4060.21 32575 36765 42119fbsent 0.7801 0.04797244 0.7087 0.7880 0.8493twsent 0.7831 0.09635818 0.5385 0.8038 0.8782posts 6.634 0.9531906 4.997 6.670 7.642tweets 5.114 1.468487 2.773 5.359 6.866likes 7.063 0.8225566 5.730 7.120 7.997
Cardinal Healthcare 11 rev 27312 3733.668 21427 27548 32039Health fbsent 0.9267 0.04155306 0.8785 0.9194 1.0000
twsent 0.8447 0.06290836 0.7308 0.8398 0.9403posts 4.689 0.6468915 2.890 4.779 5.407tweets 5.658 0.3242765 5.112 5.649 6.217likes 8.792 1.901367 3.401 9.584 10.169
Cisco Systems Technology 11 rev 12133 502.4364 11155 12137 12843fbsent 0.9564 0.01545537 0.9318 0.9529 0.9809twsent 0.9311 0.07391777 0.7778 0.9444 1.0000posts 6.260 0.1549959 5.964 6.267 6.500tweets 3.844 0.9046741 3.091 3.367 5.740likes 9.243 0.5763212 8.494 9.181 10.664
Comcast Telecom 11 rev 18424 1306.334 16791 18669 21319fbsent 0.9410 0.04123029 0.8333 0.9502 0.9786twsent 0.8393 0.08429541 0.7333 0.8571 0.9643posts 5.046 0.8424461 2.773 5.094 5.958tweets 3.902 0.5597622 2.944 4.043 4.522likes 4.675 1.256172 1.609 5.293 5.707
Disney Media 11 rev 13146 982.5374 11649 13101 15244fbsent 0.9042 0.03809248 0.8185 0.9118 0.9565twsent 0.8381 0.02702855 0.7897 0.8501 0.8738posts 5.259 0.4325906 4.522 5.268 5.861tweets 7.619 0.3594895 7.028 7.739 8.013likes 7.996 1.541024 6.500 7.393 10.997
Ford Motors Motors 11 rev 36980 1907.777 33900 37263 40251fbsent 0.8315 0.09043918 0.6667 0.8235 1.0000twsent 0.8535 0.09903704 0.6471 0.8750 1.0000posts 3.360 0.6858644 2.197 3.258 4.585tweets 2.986 0.03176739 2.890 2.996 2.996likes 5.922 1.857211 2.890 5.805 9.045
Table 5: Summary statistics (1/2)
17
Industry N Mean Standard Dev. Min. Median Max.JetBlue Transportation 6 rev 1525 119.8015 1349 1526 1687Airways fbsent 0.8651 0.1633456 0.6667 0.9286 1.0000
twsent 0.9006 0.03387919 0.8404 0.9150 0.9286posts 1.524 0.8915847 0.000 1.701 2.565tweets 4.688 1.412051 2.890 5.276 6.050likes 2.443 1.495048 0.000 2.593 4.248
Kroger Food 11 rev 27506 3989.615 23105 25539 34604& Drugstores fbsent 0.9437 0.02204527 0.8986 0.9516 0.9695
twsent 0.9211 0.06583025 0.8158 0.9211 1.0000posts 6.806 0.1378118 6.981 6.987 7.309tweets 3.473 0.5428782 2.197 3.714 4.007likes 8.036 0.347308 7.287 8.038 8.592
Nike Apparel 11 rev 7858 578.4626 6972 7779 9061fbsent 0.8167 0.2140497 0.3333 0.9167 1.0000twsent 0.5978 0.06491797 0.4933 0.6156 0.7297posts 3.415 1.102447 1.609 3.664 5.081tweets 5.809 1.730638 4.533 5.124 9.780likes 7.626 3.324417 1.946 8.297 11.555
Oracle Technology 11 rev 9500 963.2544 8448 9307 11320Systems fbsent 0.9053 0.05509861 0.8182 0.8958 0.9901
twsent 0.9709 0.041158 0.9000 1.0000 1.0000posts 4.904 1.209875 2.996 4.956 6.471tweets 2.953 0.03938844 2.890 2.944 2.996likes 6.315 2.280299 1.609 7.044 8.029
Salesforce Technology 10 rev 9500 267.8583 8448 9307 11320fbsent 0.9053 0.03978489 0.8182 0.8958 0.9901twsent 0.9709 0.05721545 0.9000 1.0000 1.0000posts 4.904 0.35777 2.996 4.956 6.471tweets 2.953 0.1349149 2.890 2.944 2.996likes 6.315 2.036299 1.609 7.044 8.029
Winstream Telecoms 11 rev 1425 48.33554 1345 1427 1499Holdings fbsent 0.6664 0.1022929 0.4900 0.6436 0.7988
twsent 0.8777 0.087678 0.7333 0.8966 0.9796posts 5.923 0.1863631 5.609 5.894 6.242tweets 4.843 1.075647 2.890 4.984 6.221likes 8.441 1.309018 5.768 8.656 9.697
Table 6: Summary statistics (2/2)
rev fsent twsent posts tweets likesrev 1fsent -0.19 1twsent -0.19 0.23 1posts 0.42 -0.15 0.02 1tweets 0.24 -0.26 -0.32 0.14 1likes 0.26 -0.11 -0.15 0.67 0.30 1
Table 7: Correlation Analysis
18
t-statistic p-valuerevenue -3.4012 0.05fbsent -2.0077 0.573twsent -2.8145 0.2375posts -3.4664 0.0480tweets -4.4737 0.01likes -1.4889 0.6245fd.fbsent -2.8867 0.208sd.fsent -2.9372 0.1877fd.twsent -4.2727 0.01fd.likes -3.5305 0.04242
Table 8: Augmented ADF-tests
sentiment by transforming these variables. After first differencing the Twitter sentiment, it no longer contains
an unit root. Hence, we can reject the null hypothesis. Neither the first nor the second differences of Facebook
sentiment could solve the non-stationarity problem, so we omit the variable from further analysis.
4.2.2 Granger-Causality tests
We determine the causality in our model to find out if lagged variables of one variable influence forecasts
of an other variable. To do so, we perform a Granger-Causality test on our system. Outcomes of this
test depend on the number of lagged terms included in the regressions. The optimal number of lags is
determined based on multiple information criteria. Based on the AIC, we determined the number of lags
to be 5. Results of the Granger-Causality test can be found in table 9. Revenue is Granger-caused by
tweets on a 5%-significance level (p= 0.03) and by posts (p= 0.04). Since lagged tweets also Granger-cause
revenue (p= 0.03), we can say that there is a feedback system between revenue and the volume of tweets.
On the 10%-significance level, posts Granger-causes tweets (p =0.08) and lagged likes Granger-cause posts
(p= 0.05). Based on the results of the GrangerCcausality test, we will assume revenue, tweets, posts and
likes as endogenous variables. Twitter sentiment will be added as an exogenous variable.
4.2.3 Model selection
Next, we build a VAR model. As already mentioned, the biggest challenge in VAR modelling is the selection
of an optimal lag length. The decision for lag length is based on multiple selection criteria, presented in
table 10. Based on the Akaike (AIC) and Hannan-Quinn (HQ) Information Criteria and the Final Prediction
Error (FPE), the VAR(2) is preferred. The Schwarz (SC) Information Criterion however suggests a model
with only one lag. The Breusch-Godfrey LM-test for serial autocorrelation is performed for both models.
The null hypothesis that has to be tested is that there is no serial correlation of any order. If we reject this
hypothesis, it means that at least one of the lagged residuals is statistically different from zero. Thus the
model still has residual autocorrelation. The results of the Breusch-Godfrey LM-test are presented in table
10. The p-values of the VAR(1) are smaller than 0.05. It is clear that on a 5%-significance level the residuals
of the VAR(1) model still have autocorrelation. Since the p-values of the VAR(2) model are bigger than
0.05, we can not reject the null hypothesis and thus no residual autocorrelation was found.
In Figure 2, estimates, standard errors and t-statistics of the coefficients in the VAR(2) model are
19
F-statistic p-valuepost- rev 2.52409597 0.04476106tweets- rev 2.71101234 0.03355857fd.twsent - rev 0.45049696 0.77184166fd.likes- rev 0.21975424 0.92694076rev- post 0.43884204 0.78030265tweets- post 0.38510417 0.81891318fd.twsent- post 0.94333677 0.44171157fd.likes- post 2.42880379 0.05180839rev- tweets 2.89561340 0.02521703post- tweets 2.12195233 0.08264406fd.twsent- tweets 0.44676642 0.77455210fd.likes- tweets 0.32806176 0.85862610rev- fd.twsent 0.13970979 0.96715188post- fd.twsent 0.05497786 0.99429516tweets- fd.twsent 1.28271511 0.28101186fd.likes- fd.twsent 0.60631800 0.65888664rev- fd.likes 0.43031762 0.78647642post- fd.likes 0.27864007 0.89126788tweets- fd.likes 0.46362470 0.76229025fd.twsent- fd.likes 0.35513970 0.83999063
Table 9: Granger causalities tests
VAR(1) VAR(2)Lags LM-Stat Prob LM-Stat Prob1 51.864 0.001247 17.151 0.87622 68.882 0.03946 59.173 0.17563 99.935 0.02878 81.692 0.27934 124.28 0.0504 103.83 0.37675 147.97 0.07871 120.34 0.6011
Table 10: LM-test for residual autocorrelation
20
Figure 2: Estimation output of the VAR(2) model
21
Figure 3: Impulse response functions
22
presented. In a final step, generalized impulse response functions have been calculated to measure the
response of one variable as a consequence of a shock in another. This will bring more clarity on the
interpretation of these coefficients.
4.2.4 Impulse response functions
In Figure 3, graphs of the generalized impulse response functions of the VAR(2) model are presented.
Around the impulse functions, the 95%-confidence interval is drawn. When we examine the effect of one
variable on another, we first have to check if the effects are significant. If the 95%-confidence interval,
[mean− 1.96 ∗SE;mean+ 1.96 ∗SE], does not include 0, the effect is significant. In other words, if the zero
line is in between these bounds, the effect is insignificant. If we have significant results, we can quantify the
impact of a shock of one variable on an other. Tables 4 and 5 presents the values of the response functions
as a result of a shock of one standard error (SD) in an other variable. In Figure 5, we noticed there was
a significant response of tweets due to a shock in revenue in period 2. With period one representing an
immediate effect and our dataset organized as quarterly data, this effect thus will occur in the next quarter.
In table 5, we find that the response of the volume of tweets, due to the shock in revenue is equal to 0.21
percent. Based on these findings, we can not reject H8, an increase in the revenue positively affects the
volume of tweets. An increase in revenue makes more people talk about a company and its products, thus
raising more awareness in the next periods. However, the effect of this shock is not persistent and dies
out fairly quickly. Still, if a company does well, this will become clear in the volume of tweets posted by
fans. A shock of one standard error in the volume of tweets will cause an effect in the sentiment of tweets
that reaches a bottom after three quarters, causing a decrease in the positivity rate of tweets of 1.6 percent
points. Finally, a decrease in the volume of Facebook posts of one standard error causes a decrease in the
volume of likes of 0.47 percent. Fewer posts may be a sign of a decrease in interaction on a fanpage, which
may decrease brand awareness and in this way resulting in fewer likes. There is a lack of significance in
the other impulse functions, which keeps us making useful comparisons between the Twitter & Facebook
metrics. Therefore, we can not test H1 and H2. The impact of social media metrics on revenue and vice
versa might differ for every company and that is the reason why making general conclusions is impossible.
For example, one company can have a very active Facebook fanpage and little Twitter activity, while the
other has multiple Twitter fanpages and little activity on Facebook. When using both companies in our
sample, effects of their social media metrics might cancel each other out. To verify this, we will study the
impact of the Facebook and Twitter variables separately.
23
Figure 4: Impulse response values (Revenue and Posts)
24
Figure 5: Impulse response values (Tweets and Facebook likes)
25
N Overall mean Overall deviation Minimum Median Maximumrev 451 9.06 1.22 7.021 9.221 11.787tweets 451 3.82 1.69 0.000 3.823 9.780twsent 451 0.83 0.17 0.000 0.8667 1.000
Table 11: Descriptive statistics of Twitter variables
rev tweets twsentrev 1tweets 0.14 1twsent -0.17 -0.03 1
Table 12: Correlation analysis Twitter variables
4.3 Twitter model
The large amount of literature using Twitter social media data indicates that for some industries there is
a link with real-world outcomes. We want to test if this link also exists for Fortune 500 companies. An
unbalanced panel dataset was created for 31 Fortune 500 companies. This dataset includes the 14 companies
used in the Facebook & Twitter mixed model, as well as 17 other Fortune 500 companies for which only
Twitter fanpages were found. Table 11 and 12 give a summary of the variables. We transformed the Twitter
variables by taking the natural logarithm. We will verify if the relationships between Twitter variables and
revenue, determined in previous model, also exist when we are not including Facebook data into the model.
4.3.1 Tests for stationarity
Before building the model, we check again for stationarity. The maximum number of lags was determined
using the Schwarz Criterion and thus equal to pmax = 17. Comparing the test statistics of revenue, tweets
and Twitter sentiment to a critical value of -3, we found that none of the variables contain a unit root since
t-statistics are (-4.2244), (-5.2214) and (-3.8326) respectively. We can use all the variables in our model.
4.3.2 Granger-Causality tests
The Granger-Causality tests were executed using 4 lags. Table 13 shows results of the tests. Only one of
the tests is significant. Revenue Granger-causes Twitter sentiment (p= 0.002). This means that there is
influence of lagged variables of revenue on Twitter sentiment. Therefore we decide to treat Twitter sentiment
and revenue as endogenous variables in our model. Since no significant results were found for Granger tests
involving the volume of tweets, we will include tweets as an exogenous variable in the model.
F-statistic p-valuetweets- rev 0.1629323 0.95702778twsent- rev 0.1999137 0.93834786rev- tweets 0.9589787 0.42979518twsent- tweets 1.0435881 0.38425873rev- twsent 4.2572951 0.00216686tweets- twsent 1.7340473 0.14140424
Table 13: Granger-causality tests for Twitter variables
26
Criterion 1 2 3 4AIC(n) -4.906291681 -4.945157843* -4.942677340 -4.940606227HQ(n) -4.877344634 -4.901737273* -4.884783247 -4.868238609SC(n) -4.832867813 -4.835022042* -4.795829606 -4.757046558FPE(n) 0.007399886 0.007117814* 0.007135523 0.007150369
Table 14: Lag selection criteria for Twitter model
VAR(2)Lags LM-stat p-value1 2.0767 0.72172 12.778 0.11973 19.013 0.088224 23.103 0.111
Table 15: LM-test for residual autocorrelation in the Twitter model
4.3.3 Model selection
Based on the Akaike and Schwarz Information Criteria, the Hannan-Quinn criterion and the Final error
prediction (table 14), the optimal VAR model is VAR(2). The LM-statistics are computed to test for serial
autocorrelation. Based on the p-values of VAR(2), p= 0.7217, we can not reject the null hypothesis. The
VAR(2) has removed all the residual autocorrelation. The coefficients of the Twitter VAR(2) model can be
found in table 6. The graphs of the generalized impulse response functions provide insight into the impact of
a variable on another. The graphs can be found in table 7, an overview of the impulse values and standard
errors are being displayed in table 8.
4.3.4 Impulse Response functions
Significant results were only found for the response of revenue on shocks in the Twitter sentiment. A shock of
one standard error causes an increase in revenue of 0.08 percent after two quarters. This confirms H3 and the
existing literature. Twitter sentiment indeed has an influence on quarterly revenue. If the Twitter sentiment
increases, more consumers are positive about a company and its products. Ultimately, this influences a
customer’s purchase decision. This effect can be strengthened, since other users will generally base their
purchase decisions on more positive tweets.
27
Figure 6: Estimation output of the Twitter model
28
Figure 7: Impulse response functions of Twitter the model
Figure 8: Impulse response values (Revenue and Twitter sentiment)
29
N Overall mean Overall deviation Minimum Median Maximumrev 259 9.16 1.10 6.775 9.337 11.237posts 259 4.91 1.60 1.099 4.912 8.178fbsent 259 0.87 0.14 0.000 0.9167 1.0000likes 259 7.48 2.74 0.000 7.498 12.476
Table 16: Descriptive statistics of Facebook variables
rev posts fbsent likesrev 1posts 0.26 1fbsent -0.14 -0.09 1likes 0.25 0.61 -0.12 1
Table 17: Correlation analysis of Facebook variables
4.4 Facebook model
The purpose and structure of a Facebook fanpage is very different from a Twitter page. Although anyone
can access these pages, the audience of a fanpage can be very different from those of a Twitter. page as
well. We want to test whether the behavior of Facebook fans can be related to revenues as well. Therefore
we create a new panel dataset with 33 Fortune 500 companies, 14 of which are also included in the sample
used in the mixed model. For these companies, we have calculated the quarterly volume of posts and likes,
as well as the rate of positive posts and comments. To ease interpretation, we use the natural logarithm of
the variables revenue, posts and likes. An overview of the variables is provided in table 16. The correlation
diagram shows higher correlations between all variables (table 17).
4.4.1 Tests for stationarity
The stationarity of the variables included in the model are checked using the augmented ADF-test. Since
we have 259 observations, we determined the maximum number of lags to be 15. The critical value of the t-
statistic is (-3.00). For the revenue, posts, fbsent and likes variable, we find respectively following t-statistics:
table 18. We can reject the null hypothesis of non-stationarity for each of these variables. This means we
can include all variables without transformations in the model.
4.4.2 Granger-Causality tests
Since all variables are stationary, no transformations are necessary in order to build our Facebook dataset.
But, we still have to determine which variables are endogenous. Based on the AIC, we performed a Granger-
Causality test with 7 lags. Results can be found in table 19. Facebook sentiment is Granger-caused by the
volume of posts (p=0.004). Lagged values of the volume of likes also Granger-cause sentiment (p=0.0059).
t-statistic p-valuerev -3.7913 0.01992posts -3.6048 0.03316fbsent -4.313 0.01likes -3.5212 0.0412
Table 18: Augmented ADF-tests for Facebook variables
30
F-statistic p.valueposts- rev 0.5328881 0.78302447fbsent- rev 0.1450795 0.98987568likes- rev 0.4924440 0.81372515rev- posts 0.8904277 0.50250861fbsent- posts 1.3332033 0.24297745likes- posts 1.6414187 0.13641281rev- fbsent 0.7353785 0.62157675posts- fbsent 2.4601645 0.02505779likes- fbsent 2.5335948 0.02136247rev- likes 0.9617110 0.45185342posts- likes 0.7590063 0.60283833fbsent- likes 1.8515196 0.08994718
Table 19: Granger-causality tests for Facebook variables
Criterion 1 2 3 4AIC(n) -3.38040425 -3.4408816* -3.43646323 -3.41986499HQ(n) -3.29561957 -3.3052261* -3.24993692 -3.18246786SC(n) -3.16971982* -3.1037865 -2.97295747 -2.82994856FPE(n) 0.03403423 0.0320385* 0.03218372 0.03272823
Table 20: Lag selection criteria for the Facebook model
Clearly, there is a relationship between the volume of posts, likes and the Facebook sentiment. In our final
VAR model, we included these variables as endogenous. Revenue is included as an exogenous variable, since
the Granger-tests showed no significant relationships between revenue and the other variables.
4.4.3 Model selection
To determine the optimal VAR, we first determine possible optimal lags based on the Akaike Information
Criterion, the Final Prediction Error, the Hannan-Quinn and the Schwarz Criterion. The latter suggests
a VAR(1) model, all the others have optimal scores for the VAR(2) model. The LM-test for residual
autocorrelation was used in order that no autocorrelation was left in these models. We rejected the hypothesis
of non-correlation in the residual terms of the VAR(1) model. The VAR(2) model on the other hand has
removed all serial correlation in the error terms. An overview of the coefficients of the estimated VAR(2)
model is given in Figure 9.
4.4.4 Impulse Response functions
Based on this VAR-model, there are little significant effects of impulses of one variable on another. The
IRF functions and values can be found in figures 10, 11 and 12. Consistent with the studied literature (Oh,
VAR(1) VAR(2)Lags LM-test p-value LM-test p-value1 29.927 0.0004514 16.155 0.063722 48.439 0.0001295 26.735 0.084093 62.308 0.0001304 36.895 0.097044 76.083 0.0001084 47.164 0.1008
Table 21: LM-test for residual autocorrelation in the Facebook model
31
Figure 9: Estimation output of Facebook VAR(2) model
32
Figure 10: Impulse response functions of the Facebook model
2016), we find that an increase of the volume of likes has a positive impact on the volume of posts and vice
versa. An innovation (shock) of one standard error in the volume of likes causes an immediate increase in
the volume of posts of 61 percent. This effect loses power over time and after 4 quarters its effect is only 39
percent. Vice versa, a shock in the volume of posts of one standard error causes an immediate increase in the
volume of likes of 1.34 percent. After 4 quarters, this effect has diminished to 0.72 percent. We can neither
confirm nor reject H4, H6, H8 and H10, since we could not prove revenue was linked to any of the Facebook
variables. However, both the mixed- and Facebook-model show that there is a link between the volume of
posts and likes. In prior literature, research did find a link between the volume of likes and revenue. Further
research that will include more companies or more data on a specific company could confirm this relationship
for Fortune 500 companies.
33
Figure 11: Impulse response values (Posts and Facebook sentiment)
Figure 12: Impulse response values (Likes)
34
4.5 Conclusion
Based on the three models built in this study- a Facebook model, a Twitter model and a mixed model- we
found significant influence of the quarterly revenue on the volume of tweets. We also found a link between
the volume of tweets and Twitter sentiment, as well as a link between between the volume of posts and likes.
The significance of the latter is confirmed when only Facebook data is included into the model. When only
using the Twitter sample, an immediate significant response of Twitter sentiment on quarterly revenue is
found. Nevertheless, since this result is not significant in the mixed model, we can not test whether it is
also significant in the wider social ecology. A possible explanation for the limited significance in the mixed
model could be the insufficient amount of data available to us. Still, our results show interesting information
concerning the link between revenue and Twitter social media metrics. This could explain why foregoing
research has predominantly used Twitter as its source for social media data. However, since Facebook is to
this day the largest social media platform with the more than 70% of its users visiting the network daily,
more research into the link between Facebook and business outcomes might be needed (Molla, 2016).
Results of this study show the importance of monitoring WOM on Twitter. Obviously, when a company
does well, sentiment will be more positive. In addition, companies will try to make their followers send more
positive tweets into the world. This might be easier on company-hosted pages than on fan-hosted pages.
Companies might try to identify the most influencing users. The influence of their positive tweets will cause
others to send more positive tweets as well. This system is typical for communities. People are more inclined
to tweet positively due to self-enhancement. Especially when the community agrees with their opinion,
people will be motivated to post their similar feelings to increase sentiment (Oh et al., 2017). Increased
Twitter sentiment will result in an increased revenue in the next quarter.
35
5 Limitations and future research
This master’s dissertation investigated the relationships between Facebook and Twitter social media metrics
and revenue for Fortune 500 companies. Despite the promising design of this study, making interesting
conclusions was hampered by several aspects. Firstly, the difference of the amount of data available on
Facebook and Twitter fanpages created a sample bias which influenced results of the mixed model. An
extension of the Facebook dataset by including fanpages that are organized as a Facebook Group or a
Community could affect the results. Furthermore, a larger mixed sample could make the effect of Twitter
sentiment on revenue found in the Twitter model significant in the mixed model as well. Secondly, due to
the limited availability of data, we could not control for industry in this study, nor the host of the fanpage.
Controlling for these variables might be interesting since the target audience might differ per sector or
host. For example industries will have fanpages mostly consisting of customers, while others have more
shareholders. These different audiences might require a different type of content, which can influence the
social media metrics. The effect of social media fanpages can also influence other business outcomes such as
stock price movements. Future research might expose different dynamics of these fanpages.
36
References
Albert, N., Merunka, D., and Valette-Florence, P. (2008). When consumers love their brands: Exploring the
concept and its dimensions. Journal of Business Research, 61(10):1062–1075.
Archak, N., Ghose, A., and Ipeirotis, P. G. (2011). Deriving the Pricing Power of Product Features by
Mining Consumer Reviews. Management Science, 57(8):1485–1509.
Asur, S. and Huberman, B. A. (2013). Predicting the Future with Social Media. Applied Energy, 112:1536–
1543. arXiv: 1003.5699.
Bagozzi, R. P. and Dholakia, U. M. (2006). Antecedents and purchase consequences of customer participation
in small group brand communities. International Journal of Research in Marketing, 23(1):45–61.
Barnes, N. G. (2014). Social commerce emerges as big brands position themselves to turn “follows”, “likes”
and “pins” into sales. Marketing Management Association Annual Spring Conference Proceedings, Chicago,
IL, pages 8–13.
Boldt, L., Vinayagamoorthy, V., Winder, F., Melanie, S., Ekram, M., Mukkamala, R., Buus Lassen, N.,
Flesch, B., Hussain, A., and Vatrapu, R. (2016). Forecasting Nike’s Sales using Facebook Data, pages
2447–2456. IEEE, United States.
Borah, A. and Tellis, G. J. (2015). Halo (Spillover) Effects in Social Media: Do Product Recalls of One
Brand Hurt or Help Rival Brands? Journal of Marketing Research, 53(2):143–160.
Breusch, T. S. (1978). Testing for Autocorrelation in Dynamic Linear Models*. Australian Economic Papers,
17(31):334–355.
Chen, K., Xie, D., and Wang, H. (2015). A novel sales prediction approach: Competing vs supporting
networks. Australian Journal of Business and Management Research, 4(10):8–n/a. Copyright - Copyright
Australian Journal of Business and Management Research (AJBMR) Jan 2015; Document feature - ; Last
updated - 2015-06-22.
Chern, C.-C., Wei, C.-P., Shen, F.-Y., and Fan, Y.-N. (2015). A Sales Forecasting Model for Consumer
Products Based on the Influence of Online Word-of-mouth. Inf. Syst. E-bus. Manag., 13(3):445–473.
Chevalier, J. A. and Mayzlin, D. (2006). The Effect of Word of Mouth on Sales: Online Book Reviews.
Journal of Marketing Research, 43(3):345–354.
Cova, B. and Pace, S. (2006). Brand community of convenience products: new forms of customer empower-
ment – the case “my nutella the community”. European Journal of Marketing, 40(9/10):1087–1105.
de Valck, K., van Bruggen, G. H., and Wierenga, B. (2009). Virtual communities: A marketing perspective.
Decision Support Systems, 47(3):185–203.
DeMarco, T. (2013). “buzz” vs. sentiment: Word of mouth’s most important distinction.
37
Dhar, V. and Chang, E. A. (2009). Does Chatter Matter? The Impact of User-Generated Content on Music
Sales. Journal of Interactive Marketing, 23(4):300–307.
Ding, C., Cheng, H. K., Duan, Y., and Jin, Y. (2017). The power of the “like” button: The impact of social
media on box office. Decision Support Systems, 94:77–84.
Duan, W., Gu, B., and Whinston, A. B. (2008). The dynamics of online word-of-mouth and product
sales—An empirical investigation of the movie industry. Journal of Retailing, 84(2):233–242.
Edosomwan, S., Prakasan, S. K., Kouame, D., Watson, J., and Seymour, T. (2011). The History of Social
Media and its Impact on Business. Journal of Applied Management and Entrepreneurship; Sheffield,
16(3):79–91.
Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis (2013). Understanding the pre-
dictive power of social media. Internet Research, 23(5):544–559.
Facebook. Facebook Pages Terms.
Fortune500 (2016). Fortune500.
Gary Koop, M. Hashem Pesaran, S. M. P. (1996). Impulse response analysis in nonlinear multivariate models.
Journal of Econometrics, 74:119–147.
Granger, C. (1980). Testing for causality: A personal viewpoint. ournal of Economic Dynamics and Control,
2:329–352.
Gruhl, D., Guha, R., Kumar, R., Novak, J., and Tomkins, A. (2005). The Predictive Power of Online
Chatter. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery
in Data Mining, KDD ’05, pages 78–87, New York, NY, USA. ACM.
Gwinner, K. P., Gremler, D. D., and Bitner, M. J. (1998). Relational Benefits in Services Industries: The
Customer’s Perspective. Journal of the Academy of Marketing Science, 26(2):101–114.
Hanna, R., Rohm, A., and Crittenden, V. L. (2011). We’re all connected: The power of the social media
ecosystem. Business Horizons, 54(3):265–273.
Hollis, N. (2011). The value of a social media fan.
Jang, H., Olfman, L., Ko, I., Koh, J., and Kim, K. (2008). The Influence of On-Line Brand Community
Characteristics on Community Commitment and Brand Loyalty. International Journal of Electronic
Commerce, 12(3):57–80.
Jansen, B. J. (2009). The commercial impact of social mediating technologies: Micro-blogging as online
word-of-mouth branding. Extended Abstracts on Human Factors in Computing Systems, pages 3859–3864.
Java, A., Song, X., Finin, T., and Tseng, B. (2007). Why We Twitter: Understanding Microblogging Usage
and Communities. page 56. Springer.
38
Johanna Gummerus, Veronica Liljander, Emil Weman, and Minna Pihlstrom (2012). Customer engagement
in a Facebook brand community. Management Research Review, 35(9):857–877.
Joseph P. Cothrel (2000). Measuring the success of an online community. Strategy & Leadership, 28(2):17–21.
Kaplan, A. M. and Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of
Social Media. Business Horizons, 53(1):59–68.
Kim, E., Sung, Y., and Kang, H. (2014). Brand followers’ retweeting behavior on Twitter: How brand
relationships influence brand electronic word-of-mouth. Computers in Human Behavior, 37:18–25.
Kozinets, R. (2008). Netnography 2.0. The Handbook of Qualitative Research
Methods in Marketing. Edited by Russell W. Belk. Edward Elgar Publishing,
http://www.academia.edu/1433499/Netnography2.0.TheHandbookofQualitativeResearchMethodsinMarketing.EditedbyRussellW .Belk.
Kwon, E. S., Kim, E., Sung, Y., and Yoo, C. Y. (2014). Brand followers. International Journal of Advertising,
33(4):657–680.
Lassen, N. B., Madsen, R., and Vatrapu, R. (2014). Predicting iPhone Sales from iPhone Tweets. In 2014
IEEE 18th International Enterprise Distributed Object Computing Conference, pages 81–90.
Latan, B. (1981). The psychology of social impact. American Psychologist, 36:343–356.
Liang, T.-P., Ho, Y.-T., Li, Y.-W., and Turban, E. (2011). What drives social commerce: The role of social
support and relationship quality. International Journal of Electronic Commerce, 16(2):69–90.
McAlexander, J. H., Schouten, J. W., and Koenig, H. F. (2002). Building Brand Community. Journal of
Marketing, 66(1):38–54.
Mishne, G. and Glance, N. (2006). Predicting Movie Sales from Blogger Sentiment. Microsoft Research.
Molla, R. (2016). Social studies: Facebook vs. twitter.
Muniz, Albert M., J. and O’Guinn, T. (2001). Brand Community. Journal of Consumer Research, 27(4):412–
432.
Oh, C., Roumani, Y., Nwankpa, J. K., and Hu, H.-F. (2017). Beyond likes and tweets: Consumer engagement
behavior and movie box office in social media. Information & Management, 54(1):25–37.
Onishi, H. and Manchanda”, P. Marketing Activity, Blogging, and Sales - MSI Web Site .
Pfaf, B. (2008). Var, svar and svec models: Implementation within r package vars. Journal of Statistical
Software.
Ping, J. W., Goh, K. Y., Lin, Z., and Goh, A. C. Q. (2012). DOES SOCIAL MEDIA BRAND COMMU-
NITY MEMBERSHIP TRANSLATE TO REAL SALES? A CRITICAL EVALUATION OF PURCHASE
BEHAVIOR BY FANS AND NON-FANS OF A FACEBOOK FAN PAGE. ECIS 2012 Proceedings.
Porter, D. G. D. C. (2009). Basic Econometrics. Mcgraw-Hill Education - Europe, fifth edition.
39
Rinker, T. (2017). Why sentimentr.
Rui, H., Liu, Y., and Whinston, A. (2013). Whose and what chatter matters? The effect of tweets on movie
sales. Decision Support Systems, 55(4):863–870.
Saboo, A. R., Kumar, V., and Ramani, G. (2016). Evaluating the impact of social media activities on human
brand sales. International Journal of Research in Marketing, 33(3):524–541.
Sashi, C. (2012). Customer engagement, buyer-seller relationships, and social media. Management Decision,
50(2):253–272.
Schwert, G. (1989). Tests for unit roots: A monte carlo investigation. Journal of Business & Economic
Statistics, 7:147–159.
Statista (2017). Most popular social networks used by fortune 500 companies in 2016.
Tiryakian, E. A., Gruzd, A., Wellman, B., and Takhteyev, Y. (2011). Imagining Twitter as an Imagined
Community. American Behavioral Scientist, 55(10):1294–1318.
Tufekci, Z. (2014). Big Questions for Social Media Big Data: Representativeness, Validity and Other Method-
ological Pitfalls. arXiv:1403.7400 [physics]. arXiv: 1403.7400.
Twitter. Parody, commentary, and fan account policy.
Walsh, M. (2006). Social networking sites fuel e-commerce traffic.
Wu, J.-J., Chen, Y.-H., and Chung, Y.-S. (2010). Trust factors influencing virtual community members: A
study of transaction communities. Journal of Business Research, 63(9–10):1025–1032.
Yu, C. H. (2009). Illustrating degrees of freedom in terms of sample size and dimensionality.
Zaglia, M. E. (2013). Brand communities embedded in social networks. Journal of Business Research,
66(2):216–223.
Zhi-Ping Fan, Yu-Jie Che, Z.-Y. C. (2017). Product sales forecasting using online reviews and historical sales
data: A method combining the bass model and sentiment analysis. Journal of Business Research, 74:90–100.
Zhu, F. and Zhang, X. M. (2010). Impact of Online Consumer Reviews on Sales: The Moderating Role of
Product and Consumer Characteristics. Journal of Marketing, 74(2):133–148.
40