33
IN DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS , STOCKHOLM SWEDEN 2017 Applying investor sentiment to a prediction model of the stock market AUGUST BERGMAN SONJA ERICSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2017

Applying investor sentiment to a prediction model of the stock market

AUGUST BERGMAN

SONJA ERICSSON

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Page 2: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Applying investor sentimentto a prediction model of thestock market

AUGUST BERGMAN AND SONJA ERICSSON

Degree Project in Computer ScienceDate: June 5, 2017Supervisor: Jeanette Hällgren KotaleskiExaminer: Örjan EkebergSwedish title: Applicering av tonalitetsanalys för att förutspårörelser på aktiemarknadenSchool of Computer Science and Communication

Page 3: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and
Page 4: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

iii

Abstract

Using data-driven methods to predict the movements of the stock mar-ket is a growing field of research. Recently, large amounts of datasourced from online news and social media have been utilized to pre-dict movements in financial markets. With the emergence of socialmedia platforms, data can be gathered and used to quantify the sen-timent of the market. This study investigates whether investor senti-ment can be used to improve the precision of a prediction model of thestock market, specifically to explore whether the precision of a modelwhich predicts intraday price change in direction of certain equitiescan be enhanced by the addition of investor sentiment. By collect-ing sentiment data derived from the classification of large amounts ofmessages from a social media platform aimed at investors and traders,a model was trained using technical data and subsequently retrainedcombined with sentiment data, to compare their performance. The re-sults show that the predictive performance of the model is enhancedslightly by using sentiment data which indicates that there are poten-tial benefits in using sentiment data to predict intraday price changein direction. However, as neither of the models shows significant clas-sification performance, the results of this study should not be viewedas conclusive.

Page 5: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

iv

Sammanfattning

Att använda data för att förutspå kommande prisförändringar på akti-emarknaden får allt mer ökad uppmärksamhet inom forskning. Nyli-gen har nyheter och aktivitet på sociala medier använts för att förutspårörelser i finansiella marknader. Med uppkomsten av sociala medie-plattformar riktade mot investerare har det blivit möjligt att samla instora mängder data och använda det för att kvantifiera den samladeuppfattningen om marknaden. Denna studie undersöker om precisio-nen av en prediktionsmodell av aktiemarknaden kan förbättras genomatt använda sig av tonalitetsanalys inom investerarplattformar för attfinna den samlade bedömningen om en finansiell tillgång, mer speci-fikt huruvida precisionen hos en modell som förutser de dagliga pris-förändringarna för specifika aktier kan förbättras. Detta har genom-förts genom att samla in data som framställts genom att klassificeraen stor mängd meddelanden från en social medie-plattform för inve-sterare. Resultaten från studien tyder på att en tonalitetsanalys lett tillatt modellens klassificeringsprecision förbättrats, vilket indikerar attinvesterares uppfattningar om marknaden kan användas för att för-utspå prisförändringar för en aktie. Modellernas precision är däremotinte tillräckligt signifikanta för att studiens resultat ska bedömas somslutgiltiga.

Page 6: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Contents

1 Introduction 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . 21.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 The stock market . . . . . . . . . . . . . . . . . . . . . . . 42.2 Technological prediction models and support vector ma-

chines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Sentiment analysis . . . . . . . . . . . . . . . . . . . . . . 6

3 Method 83.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Evaluating performance . . . . . . . . . . . . . . . . . . . 9

4 Results 124.1 Performance of each model . . . . . . . . . . . . . . . . . 15

5 Discussion 165.1 Discussion of results . . . . . . . . . . . . . . . . . . . . . 165.2 Discussion of method . . . . . . . . . . . . . . . . . . . . . 17

6 Conclusion 20

7 Future research 21

Bibliography 22

v

Page 7: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and
Page 8: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 1

Introduction

Modeling the behavior of the stock market and predicting the futurevalue of financial assets can potentially result in massive investmentreturns. While there are many different methodologies used for theprediction of the stock market, with the advent of large quantities ofdata sourced from the Internet, effective machine learning algorithmshave made the prediction of the stock market using data-driven meth-ods an important field of research.

The efficient-market hypothesis (EMH) states that financial mar-kets are informationally efficient and that as such, all available infor-mation related to a financial asset is reflected in its valuation, mak-ing prediction future stock prices inherently impossible [6]. The EMHis however not without its critics. Behavioral finance theory assertsthat since the valuation of financial assets is driven by human agents,the psychological behaviors of these agents has an influence on fu-ture market prices. It is suggested that stock market trading decisionsare influenced by emotions and that the behaviors of investors thusare shaped by periods of optimism or pessimism regarding the futureprice of stocks [2]. Being able to model these behaviors would thenlead to at least a certain degree of predictability of the stock market.Digital trends and large quantities of data can potentially be utilized topredict movements in financial markets, as some research indicate thatsuch trends can be used as early indicators of new information, not re-flected in the pricing of the concerned financial asset [12] [2] [25][5] [4][26] [10].

Several studies indicate that there are potential benefits of usingthe sentiment of the market to predict changes in the stock market [3]

1

Page 9: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

2 CHAPTER 1. INTRODUCTION

[22] [21] [17] [20] [29] [1] [16]. Sentiment analysis uses natural lan-guage processing to determine public opinion, and it has been appliedto stock market prediction by extracting the collective opinion regard-ing certain financial assets. Bollen, Mao, and Zeng [3] implement aprediction model of the Dow Jones Industrial Average (DJIA) com-bined with indicators of the public mood to predict daily changes inthe DJIA closing prices with an accuracy of 87% and with 6% reduc-tion of the mean average percentage error. Zhang, Fuehres, and Gloor[29] measures collective emotions by analyzing Twitter posts and suc-cessfully correlate it to stock market indicators the following day. Ruizet al. [20] studies the correlation between micro-blog activity and stockmarket events, specifically messages relating to certain companies.

Quantifying and learning from trends that model the market sen-timent has been made possible by access to large amounts of data,through the emersion of social media platforms aimed at the tradingcommunity. This makes it possible to investigate not only whether in-vestor sentiment can be used as early indicators of new information,but also study to what extent collective information gathering can beused to predict market trends, and if social media activity precedesdecision making.

This study aims to gather sentiment data regarding particular stocksin order to analyze whether investor sentiment can be used to enhancestock market price prediction models.

1.1 Research Question

This report investigates whether investor sentiment can be used to en-hance data-driven stock market prediction, and thus seeks to answerthe following question: Can investor sentiment data be used to improve theprecision of a prediction model of the stock market?

1.2 Scope

The scope of the study is limited by a number of factors, includingtime and knowledge in the field of machine learning. Hence, a high-level framework and data preprocessing tools are used to construct theprediction model. The results are derived and evaluated using basicstatistical methods.

Page 10: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 1. INTRODUCTION 3

The models that are to be used will be trained using Apple andFacebook technical and sentiment data. Historical stock data will begathered using Google Finance. Investor sentiment data will be re-trieved using PsychSignal, a provider of trader sentiment data andanalysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and traders. The availabil-ity of social media data is a limitation of this study, as these platformsonly have been available for a few years and much of the data miningdone in this field uses proprietary techniques that are not publicallyavailable. Since the study is done using pre-processed data, the studyis limited by lack of insight into the methods used for deriving investorsentiment metrics.

The network architecture used is selected from models evaluatedby earlier studies in the field and will be implemented with a high-level framework. The model will only be used to predict intraday pricechanges in direction. The study of sound investing strategies and de-termining causality is beyond the scope of this study.

1.3 Outline

The rest of the report is structured as follows:Section two outlines the technical background and earlier research toreview relevant theoretical material and define concepts essential forthe rest of the report. Section three presents the method used to per-form the study. Section four shows the results acquired by the study.Section five outlines an evaluation of the results and an analysis of themodel’s performance. Finally, the paper concludes with a summary insection six.

Page 11: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 2

Background

This section describes the stock market, the different technological pre-diction models and machine learning algorithms used for stock predic-tion, and the concept of sentiment analysis.

2.1 The stock market

The stock market is the collective word used to refer to the aggregationof stock markets and exchanges in the world where the issuing, buyingand selling of stocks occurs. A stock represents a share of ownershipin a company and is an effective way for corporations to raise moneyfor expansion or otherwise. By issuing stocks, investors are able to buyshares in a corporation and are thus entitled to any assets or earningsof the corporation.

Stock trading is one of the most common ways for investors to in-vest money with high liquidity and many companies exist with thesole purpose of investing money in the stock market. There are differ-ent opinions of what affect the price of a stock, but ultimately the priceof a stock is determined by the demand and supply for it. According tothe Stockholm Stock Exchange the demand and supply for a stock aremainly affected by expectations, company earnings and dividends, in-terest rates and the well-being of the economy, taxes and subventions,trends in society, changes in the company, rumors, and speculations.The price of a stock is determined by earnings in the long run, but inthe short run, there are a number of factors influencing it [14].

The predictability of the stock market prices has been debated for along time and is still today. The EMH constitutes an important part of

4

Page 12: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 2. BACKGROUND 5

the modern financial theory and maintains that markets are informa-tionally efficient, meaning that the pricing of assets reflects all avail-able information and that as such it is impossible to outperform themarket. The efficiency of markets is a consequence of competition; as-sets will always trade at their fair value because information is avail-able to anyone, and any changes will be incorporated in prices with-out delay. Associated with the EMH is the random-walk hypothesiswhich states that the price series of stocks evolves randomly. The intu-ition behind it is that if information is incorporated in prices withoutdelay, then prices are only affected by news, which are unpredictableby definition [12].

Stock market prediction can be divided into three categories; tech-nical, fundamental and technological analysis. Technical analysis looksat technical data such as past prices, while fundamental uses informa-tion about a company’s business. Technological analysis is the use ofmachine learning and data mining techniques to predict the stock mar-ket.

The constraints of the EMH have been shown to be potentially toorestrictive and that some prediction of the stock market may be pos-sible [15]. One argument against it is that it has been empirically dis-proven by the multitude of investors who have consistently succeededto outperform the market. Another proclaim is that people do not per-ceive information the same, and further that there are actors in themarket which are driven by predictable behavioral and psychologicalelements [12] [2] [15].

2.2 Technological prediction models and sup-port vector machines

For a machine learning model to be accurately applied to the stockmarket prediction problem, it must be able to handle the complexlogic it represents, and particularly be able to handle noisy data andavoid overfitting. The learning algorithm used has a great impact on amodel’s ability to predict future changes.

A variety of machine learning algorithms has been applied to thestock market prediction problem. One commonly used model is thatof supervised learning, which is used to classify input data by trainingthe learning algorithm using labeled data. Supervised learning can be

Page 13: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

6 CHAPTER 2. BACKGROUND

implemented using a variety of different algorithms.Support vector machine (SVM) is one specific model of supervised

learning commonly used for both classification and regression anal-ysis, which is resilient to overfitting. Using the kernel trick, SVMscan be used to learn classification rules even if the input data is notlinearly separable. The selection of kernel method is usually donethrough cross-validation [11]. SVMs have been successfully appliedto the forecasting of financial time series in several earlier studies [9][28] [24].

2.3 Sentiment analysis

Sentiment analysis has become an attractive field of study in computerscience [7]. It can be defined as the act of using natural language pro-cessing to extract the subjective attitude about a particular topic. Theemergence of big data and social media platforms has made it possibleto use sentiment analysis to extract the collective opinion about stocks[3].

The internet and the availability of big data has changed the wayinvestors retrieve and react to information. It is natural for a humanto resort to ‘the wisdom of the crowd’ when facing large quantities ofinformation and potentially great uncertainty [13]. Some research alsoindicates that while the stock market may be inherently unpredictable,since it is driven by new rather than old information, social media ac-tivity can be used as an early indicator of upcoming events, thus mak-ing the extraction of this activity possible to be used as early predictorsof changes in the price of financial assets, i.e. early indicators of newsthat the market may not have reacted to yet. In fact, it may be that notonly news but also social media is used by actors in the stock marketwhen making decisions about investing in a stock, as human decisionmaking is preceded by a phase of information gathering [3]

StockTwits is a social media platform aimed at the investing com-munity, where investors can follow communications regarding a spe-cific stock using its corresponding ticker symbol through the use of“cashtags” [27]. The classification of communication according to itsrelevant stock makes the platform suitable for extracting data that re-flects the collective opinion regarding a certain equity. PsychSignaluses a proprietary algorithm to extract quantitative market data. Their

Page 14: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 2. BACKGROUND 7

technology can be used to identify the stock-specific market sentimentby classifying StockTwits data [18].

Page 15: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 3

Method

The study started with a literature study in order to investigate thestate-of-the-art in technological stock prediction and social media sen-timent data extraction, in order to train a model suitable for the prob-lem domain. A phase of data gathering then commenced, where his-torical stock data and pre-processed sentiment data was selected andretrieved. A Support Vector Machine (SVM) classifier using a radialbasis kernel (RBF) function was then selected through cross-validation,and sequentially trained using scikit-learn, a tool for implementingdata mining using the Python programming language. The RBF ker-nel has many advantages, such as the ability to map non-linear train-ing data and the ease of implementation [24].

3.1 Data

Historical stock data containing intraday trading metrics were extractedusing Google Finance. Each day of trading represents one data sam-ple, labeled according to their intraday price change in direction. Thisdata is presented in table 3.1.

PsychSignal sentiment data was extracted using Quantopian. Thisdata corresponds to tweets regarding specific stocks selected by thecompany cashtag from the StockTwits platform, which makes it pos-sible to classify sentiment regarding specific stocks. These messagesare then classified as either bullish or bearish, representing a positiveor negative view of the stock and thus contributing to an upward ordownward movement in sentiment trends, respectively. This data ispresented in table 3.2.

8

Page 16: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 3. METHOD 9

Table 3.1: Technical data

Open Price of first trade of the dayHigh Highest trading price of the dayLow Lowest trading price of the dayClose Price of the last trade of the dayVolume Number of trades of the day

These metrics were then processed for suitable feature extraction.The formulas used for this processing are presented in table 3.3.

The features derived from applying the preprocessing formulas tothe input metrics are presented in table 3.4. All feature values werescaled to zero mean and unit variance prior to training, a necessarypre-processing step for SVMs [23].

The particular stocks that were used as input data were chosenbased on the availability of data. The data collected was restrictedto the time period within which sufficient amounts of sentiment datawere available. This resulted in the gathering of Apple (AAPL) stockand sentiment data during the period 2012-01-01 to 2017-03-02 result-ing in 1293 data points, and 2014-01-01 to 2017-03-02 for Facebook (FB),resulting in 792 data points. Since there is no trading done duringweekends or holidays, data points from these dates were removed.

3.2 Evaluating performance

K-fold cross-validation was used to evaluate the model hyper-parametersas the amount of data was restricted to the time period where theStockTwits platform was popularly used. This validation techniqueis used when there is not enough available data to split the data intopartitions used for validation [8]. The model was then evaluated onits test performance by performing a classification measure using itscapability of predicting the change in direction.

Page 17: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

10 CHAPTER 3. METHOD

Table 3.2: Sentiment data

Bull scored messages Total count of bullish sentiment messagesscored by PsychSignal’s algorithm

Bear scored messages Total count of bearish sentiment messagesscored by PsychSignal’s algorithm

Bullish intensity

Score for each message’s language for thestength of the bullishness present in themessages on a 0-4 scale. 0 indicates nobullish sentiment measured, 4 indicatesstrongest bullish sentiment measured.

Bearish intensity

Score for each message’s language for thestength of the bearish present in themessages on a 0-4 scale. 0 indicates nobearish sentiment measured, 4 indicatesstrongest bearish sentiment measured

Total scanned messages

Number of messages coming throughPsychSignal’s feeds and attributable to asymbol regardless of whether thePsychSignal sentiment engine can scorethem for bullish or bearish intensity

The descriptions are taken as is from Psychsignal Example Notebook[19].

Table 3.3: Preprocessing formulas

Definition

Price delta Close pricet � Close pricet�1

Price movement If Price delta > 0 then 1 else 0Positivity measure Bull scored messages

Bear scored messages + Bull scored messages

Activity measure Total scanned messagest � SMASimple moving average, SMA 1

5

P5i=1 Total scanned messagest�i

Page 18: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 3. METHOD 11

Table 3.4: Features

Open Price of first trade of the dayHigh Highest trading price of the dayLow Lowest trading price of the dayClose Price of the last trade of the dayLog-scaled volume Logarithmized volumeDelta Change in closing price from day beforePositivity Ratio of bullish tweets from all classifiedActivity Activity measureBullish intensity Aggregated bullish intensity in messagesBearish intensity Aggregated bearish intensity in messages

Page 19: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 4

Results

The results from the trained models for both stocks that have beenevaluated is presented here. The first model is trained using technicaldata only, the other with technical and sentiment data combined forthe same time period. Excerpts from technical and sentiment featuresfor a few of the data points are shown in table 4.1 and 4.2, respectively.Figures 4.1 and 4.2 show how the historical price for the respectivestocks has changed during the considered time period.

Table 4.1: Technical data points for the Apple StockDate Open High Low Close Delta Volume2017-03-02 140.0 140.28 138.76 138.96 -0.83 262109842017-03-01 137.89 140.15 137.6 139.79 2.80 364145852017-02-28 137.08 137.44 136.7 136.99 0.06 234828602017-02-27 137.14 137.44 136.28 136.93 0.27 202574262017-02-24 135.91 136.66 135.28 136.66 0.13 217765852017-02-23 137.38 137.48 136.3 136.53 -0.58 207881862017-02-22 136.43 137.12 136.11 137.11 0.41 208369322017-02-21 136.23 136.75 135.98 136.7 0.98 245071562017-02-17 135.1 135.83 135.1 135.72 0.38 221981972017-02-16 135.67 135.9 134.84 135.34 -0.17 22584555

12

Page 20: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 4. RESULTS 13

Table 4.2: Sentiment data points for the Apple StockDate Positivity Activity Bullish Intensity Bearish Intensity2017-03-02 0.62 45.80 1.82 1.872017-03-01 0.68 637.20 1.75 1.732017-02-28 0.63 -247.60 1.80 1.792017-02-27 0.69 120.20 1.90 1.852017-02-24 0.62 55.80 1.73 1.662017-02-23 0.61 -146.80 1.95 1.642017-02-22 0.64 -228.00 1.87 1.822017-02-21 0.69 -270.40 1.83 1.752017-02-17 0.62 -438.20 1.71 1.782017-02-16 0.68 -295.20 1.85 1.49

Figure 4.1: Close price, Apple

Figure 4.2: Close price, Facebook

Page 21: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

14 CHAPTER 4. RESULTS

Table 4.3 and table 4.4 show statistical information regarding theused sentiment data. Note that the standard deviation of the amountof messages posted each day is quite high and that the amount of mes-sages is very low for certain days.

Table 4.3: Sentiment data summary, Applemean std min max

Total scanned messages 1072.41 966.04 26.00 8387.00Bull scored messages 257.51 230.27 3.00 1995.00Bear scored messages 157.33 155.76 0.00 1380.00Bullish intensity 1.71 0.11 1.16 2.18Bearish intensity 1.76 0.18 0.00 2.80

Table 4.4: Sentiment data summary, Facebookmean std min max

Total scanned messages 368.12 438.75 8.00 4953.00Bull scored messages 92.32 107.82 1.00 999.00Bear scored messages 48.82 65.39 0.00 784.00Bullish intensity 1.77 0.16 1.07 2.69Bearish intensity 1.75 0.36 0.00 4.00

Page 22: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 4. RESULTS 15

4.1 Performance of each model

The results for each model run on each of the two stocks are presentedin table 4.5. The tables show the classification accuracy on the test data,as a percentage, for the best performing runs for both models.

Table 4.5: Prediction accuracy

Model 1 Model 2

Apple 52.124 55.985Facebook 52.201 57.233

Figure 4.3: Model comparison

Page 23: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 5

Discussion

5.1 Discussion of results

The research conducted in this paper sought to explore the question ofwhether sentiment data could enhance a data-driven prediction modelof the stock market. The results stated in table 4.5 are to be analyzedhere.

The networks that were trained to predict intraday changes in di-rection for two different stocks using technical data only achieved aclassification accuracy slightly higher than random guessing. Trainingthe same networks using the same technical data from the same timeperiod along with sentiment data containing information about themood of the market did indeed increase the accuracy of the trainednetworks, for both tested stocks. The degree of accuracy improvementwas, however, not more significant than a few percentage points.

These results were achieved using training data spanning from therather short period of time that investor sentiment data have beenavailable, which may have greatly limited the performance of the trainednetworks. These restrictions, however, apply to both versions of thetrained networks, and thus do not inherently limit the enhancing ca-pabilities of applying sentiment data to a prediction model, but ratherthe performance of any of the networks trained. These limitations thusdo not necessarily void the results gathered from this study, but mod-eling the wisdom of the crowd is, of course, made less accurate whenthe crowd is too small.

The results indicate that the predictive performance of the modelis enhanced slightly by using sentiment data. The performance of the

16

Page 24: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 5. DISCUSSION 17

trained models are not significantly better than random guessing, andthese results thus do not necessarily imply that the technical data usedto train the networks is sufficiently indicative of changes in direction ofstock market prices to give a significant predictive performance. Thismay have limited the success of the conducted research.

The results retrieved by applying sentiment data to the predictionmodel trained here implies a slight increase in accuracy in predictingintraday changes in price direction but does not imply any attempt ata sound investing strategy, as this is outside the scope of this research.These results and their underlying correlations ought to have beenstudied further using more rigorous methods than what was used inthis research to determine an answer to the problem stated.

5.2 Discussion of method

Using large quantities of data mined from the Internet in order to ex-plore the predictability of the stock market is a relatively new area ofresearch, so attempting to disprove the EMH is beyond the scope ofthis study. There are undoubtedly many complex factors that haveaffected the efficiency of this study with the methods that were used.Many of these complex relationships are simply not known, which nat-urally affects how efficiently a particular model of these effects can beused to explore them.

Due to time constraints, the method used might have lacked thenecessary complexity to accurately model these relationships, whichhas affected the effectiveness of the trained model. A fast and easy toimplement machine learning framework was used to implement thetrained networks, with limited means of analyzing results. This re-sulted in a difficulty in analyzing whether the results achieved herewere limited due to the simplified model, or due to the actual relation-ship between sentiment data and future stock prices not existing. Thisdependency between evaluating the model and the phenomenon thathas been attempted to model might have limited the success of thisresearch.

Another limitation of the study is how we decided to model what itactually means for a financial asset to be overvalued or undervalued.This is evident in two ways. First of all, we simply label data by usingthe relative intraday change in value, which in and of itself does not

Page 25: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

18 CHAPTER 5. DISCUSSION

necessarily determine if a stock actually is overvalued or undervalued.This would have required a more sophisticated way of labeling data,possibly also on a larger timeframe. A system as complex as the stockmarket justifies a more thorough analysis of what it actually meansfor a financial asset to be correctly valued. Secondly, the magnitude ofchange is not taken into account. One possible improvement wouldhave been to define a certain magnitude of change in direction as thestock being correctly valued.

The learning model that was to train our classifier was chosen dueto its relative ease of implementation using a high-level framework.One disadvantage of the way this model was used is the lack of inter-pretability of our results. A more thorough statistical analysis of themodel’s performance would have greatly improved the reliability ofour results.

Another clear limitation of the study is the relative lack of trainingdata. The prediction model was trained solely using data availablefrom when sentiment data from the vendor we used has been avail-able, which greatly reduced the amount of technical data that we wereable to use in training the model. The relative lack of data also reducedthe amount of validation we were able to do to validate the model’sperformance. One way to counter this would have been to train thenetwork on substantially more individual stocks and concatenate theresults. This could also have had the effect of generalizing the resultsof the study.

In choosing pre-processed sentiment data, we compromised theamount of analysis of sentiment data that might be necessary to moreaccurately define what constitutes investor sentiment. This was, how-ever, a necessary step due to time constraints. This data contains manypoints of interesting analysis such as the underlying motivations ofpeople speaking positively or negatively about certain stocks. The waythis data has been interpreted undoubtedly has a direct impact on themodel’s performance but were, again, necessary under the constraintsof this study.

Another limitation of the study is that there possibly exists a degreeof ambivalence as to what the data actually models, meaning that theway we have chosen to interpret sentiment data might not actually bean accurate model of what the data in reality represents. One couldjust as well have interpreted a negative attitude towards a stock as asign of a degree of risk associated with it rather than interpreting it as

Page 26: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

CHAPTER 5. DISCUSSION 19

a sign of the stock being overvalued. We have also chosen to treat abullish sentiment in the same way as a bearish sentiment towards astock, which may not necessarily reflect the intent of the market.

Page 27: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 6

Conclusion

In this study, a predictive model of the stock market using historicaltechnical data was compared to one enhanced using social media sen-timent data in order to determine if investor sentiment can be used toenhance a prediction model of the stock market. A SVM classifier wastrained using technical data for two different stocks in the technologysector using the Python framework scikit-learn. The accuracy of thetrained networks was then evaluated, and sequentially re-trained us-ing sentiment data derived by PsychSignal technology from the Stock-Twits investor social media platform, which classifies a bullish (posi-tive) and a bearish (negative) sentiment towards a certain stock. Thetraining data was labeled using its intraday point of change in direc-tion.

An evaluation of the performance of the trained networks showsan increased classification accuracy of intraday price directions whenapplying sentiment data. However, due to the relatively poor perfor-mance of the trained networks and the limitations in the methods usedin this study, further research should assure that this holds true. Theresults of this study should thus not be viewed as conclusive.

20

Page 28: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Chapter 7

Future research

Future research may improve the ways in which this study was con-ducted in several ways. A more sophisticated model of stock pre-diction using technical data with a more rigorous statistical analysiswould improve the certainty of the results. One way to do this is tocompare the predictive performance of several network architecturesand to increase the amount of training data by studying many differentindividual stocks. One might also need to introduce a lower thresholdfor how many classified messages can constitute an adequate measureof sentiment.

This study was conducted by evaluating the intraday changes inprice directions. This may be improved by studying a larger span oftime-ahead prediction evaluation, to more accurately model the realityof stock trading.

21

Page 29: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

Bibliography

[1] Malcolm Baker and Jeffrey Wurgler. Investor Sentiment and theCross-Section of Stock Returns. Working Paper 10449. National Bu-reau of Economic Research, Apr. 2004. DOI: 10.3386/w10449.URL: http://www.nber.org/papers/w10449.

[2] J. Bollen and H. Mao. “Twitter Mood as a Stock Market Predic-tor”. In: Computer 44.10 (Oct. 2011), pp. 91–94. ISSN: 0018-9162.DOI: 10.1109/MC.2011.323.

[3] Johan Bollen, Huina Mao, and Xiaojun Zeng. “Twitter mood pre-dicts the stock market”. In: Journal of Computational Science 2.1(2011), pp. 1–8. ISSN: 1877-7503. DOI: http://doi.org/10.1016/j.jocs.2010.12.007. URL: http://www.sciencedirect.com/science/Article/pii/S187775031100007X.

[4] Werner F. M. De Bondt and Richard Thaler. “Does the Stock Mar-ket Overreact?” In: The Journal of Finance 40.3 (1985), pp. 793–805.ISSN: 00221082, 15406261. URL: http://www.jstor.org/stable/2327804.

[5] Stefano DellaVigna. Psychology and Economics: Evidence from theField. Working Paper 13420. National Bureau of Economic Re-search, Sept. 2007. DOI: 10.3386/w13420. URL: http://www.nber.org/papers/w13420.

[6] Eugene F Fama. “Efficient capital markets: A review of theoryand empirical work”. In: The journal of Finance 25.2 (1970), pp. 383–417.

[7] Ronen Feldman. “Techniques and Applications for SentimentAnalysis”. In: Commun. ACM 56.4 (Apr. 2013), pp. 82–89. ISSN:0001-0782. DOI: 10.1145/2436256.2436274. URL: http://doi.acm.org.focus.lib.kth.se/10.1145/2436256.

2436274.

22

Page 30: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

BIBLIOGRAPHY 23

[8] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The el-ements of statistical learning: data mining, inference and prediction.2nd ed. Springer, 2009, p. 241. URL: http : / / www - stat .stanford.edu/~tibs/ElemStatLearn/.

[9] Wei Huang, Yoshiteru Nakamori, and Shou-Yang Wang. “Fore-casting stock market movement direction with support vectormachine”. In: Computers Operations Research 32.10 (2005). Appli-cations of Neural Networks, pp. 2513–2522. ISSN: 0305-0548. DOI:http://doi.org/10.1016/j.cor.2004.03.016. URL:http://www.sciencedirect.com/science/Article/

pii/S0305054804000681.

[10] Daniel Kahneman and Amos Tversky. “Prospect Theory: An Anal-ysis of Decision under Risk”. In: Econometrica 47.2 (1979), pp. 263–291. ISSN: 00129682, 14680262. URL: http://www.jstor.org/stable/1914185.

[11] Yong Liu, Shali Jiang, and Shizhong Liao. “Efficient Approxima-tion of Cross-validation for Kernel Methods Using Bouligand In-fluence Function”. In: Proceedings of the 31st International Confer-ence on International Conference on Machine Learning - Volume 32.ICML’14. Beijing, China: JMLR.org, 2014, pp. I-324–I-332. URL:http://dl.acm.org/citation.cfm?id=3044805.

3044843.

[12] Burton G. Malkiel. “The Efficient Market Hypothesis and Its Crit-ics”. In: The Journal of Economic Perspectives 17.1 (2003), pp. 59–82. ISSN: 08953309. URL: http://www.jstor.org/stable/3216840.

[13] Michela Nardo, Marco Petracco-Giudici, and Minás Naltsidis.“WALKING DOWN WALL STREET WITH A TABLET: A SUR-VEY OF STOCK MARKET PREDICTIONS USING THE WEB”.In: Journal of Economic Surveys 30.2 (2016), pp. 356–369. ISSN: 1467-6419. DOI: 10.1111/joes.12102. URL: http://dx.doi.org/10.1111/joes.12102.

[14] Nasdaq Nordic. Vad bestämmer priset på aktier? http://www.

nasdaqomxnordic.com/utbildning/aktier/vadbestammerprisetpaaktier.Accessed: 2017-04-23.

Page 31: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

24 BIBLIOGRAPHY

[15] Alya Al Nasseri, Allan Tucker, and Sergio de Cesare. “Quanti-fying StockTwits semantic terms’ trading behavior in financialmarkets: An effective application of decision tree algorithms”.In: Expert Systems with Applications 42.23 (2015), pp. 9192–9210.ISSN: 0957-4174. DOI: http://doi.org/10.1016/j.eswa.2015.08.008. URL: http://www.sciencedirect.com/science/Article/pii/S0957417415005473.

[16] Chong Oh and Olivia Sheng. “Investigating predictive power ofstock micro blog sentiment in forecasting future stock price di-rectional movement”. In: (2011).

[17] Tobias Preis, Helen Susannah Moat, and H. Eugene Stanley. “Quan-tifying Trading Behavior in Financial Markets Using Google Trends”.In: Scientific Reports 3.1684 (Apr. 2013). URL: http://dx.doi.org/10.1038/srep01684.

[18] PsychSignal. About Us. https://psychsignal.com. Accessed:2017-04-23.

[19] Psychsignal Example Notebook. StockTwits Trader Mood. https://www.quantopian.com/data/psychsignal/stocktwits.Accessed: 2017-05-11.

[20] Eduardo J. Ruiz et al. “Correlating Financial Time Series withMicro-blogging Activity”. In: Proceedings of the Fifth ACM Inter-national Conference on Web Search and Data Mining. WSDM ’12.Seattle, Washington, USA: ACM, 2012, pp. 513–522. ISBN: 978-1-4503-0747-5. DOI: 10.1145/2124295.2124358. URL: http://doi.acm.org.focus.lib.kth.se/10.1145/2124295.

2124358.

[21] Serguei Saavedra et al. “Synchronicity, instant messaging, andperformance among financial traders”. In: Proceedings of the Na-tional Academy of Sciences of the United States of America 108.13(2011), pp. 5296–5301. ISSN: 00278424. URL: http://www.jstor.org/stable/41125693.

[22] Robert P. Schumaker and Hsinchun Chen. “Textual Analysis ofStock Market Prediction Using Breaking Financial News: TheAZFin Text System”. In: ACM Trans. Inf. Syst. 27.2 (Mar. 2009),12:1–12:19. ISSN: 1046-8188. DOI: 10.1145/1462198.1462204.URL: http://doi.acm.org.focus.lib.kth.se/10.1145/1462198.1462204.

Page 32: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

BIBLIOGRAPHY 25

[23] Scikit-learn. Preprocessing data. Accessed: 2017-04-23.

[24] Alaa F. Sheta, Sara Elsir M. Ahmed, and Hossam Faris. “A Com-parison between Regression, Artificial Neural Networks and Sup-port Vector Machines for Predicting Stock Market Index”. In:International Journal of Advanced Research in Artificial Intelligence(IJARAI) 4.7 (2015). DOI: http://dx.doi.org/10.14569/IJARAI . 2015 . 040710 # sthash . hZ48g3h8 . dpuf. URL:http://thesai.org/Publications/ViewPaper?Volume=

4&Issue=7&Code=IJARAI&SerialNo=10.

[25] Andrei Shleifer. Inefficient Markets: An Introduction to BehavioralFinance. 2000. DOI: 10.1093/0198292279.001.0001.

[26] Andrei Shleifer and Lawrence H. Summers. “The Noise TraderApproach to Finance”. In: The Journal of Economic Perspectives 4.2(1990), pp. 19–33. ISSN: 08953309. URL: http://www.jstor.org/stable/1942888.

[27] StockTwits. About StockTwits. https://stocktwits.com/about. Accessed: 2017-04-23.

[28] Francis E.H Tay and Lijuan Cao. “Application of support vec-tor machines in financial time series forecasting”. In: Omega 29.4(2001), pp. 309–317. ISSN: 0305-0483. DOI: http://doi.org/10.1016/S0305-0483(01)00026-3. URL: http://www.sciencedirect.com/science/Article/pii/S0305048301000263.

[29] Xue Zhang, Hauke Fuehres, and Peter A. Gloor. “Predicting StockMarket Indicators Through Twitter “I hope it is not as bad as Ifear””. In: Procedia - Social and Behavioral Sciences 26 (2011), pp. 55–62. ISSN: 1877-0428. DOI: http://dx.doi.org/10.1016/j.sbspro.2011.10.562. URL: http://www.sciencedirect.com/science/Article/pii/S1877042811023895.

Page 33: Applying investor sentiment to a prediction model …1107755/...analysis, which can be used to mine social network data from Stock-Twits, a social media platform for investors and

www.kth.se