Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Towards a Benchmarking Frameworkfor Financial Text Mining
Caslav Bozic1, Ryan Riordan2, Detlef Seese1, and Christof Weinhardt2
1 Institute of Applied Informatics and Formal Description Methods, KIT{bozic, detlef.seese}@kit.edu
2 Institute of Information Systems and Management, KIT{ryan.riordan, weinhardt}@kit.edu
Summary. Different data mining methods for financial texts and various sentiment mea-sures are described in the existing literature, without common benchmarks for comparingthese approaches. The framework proposed in this paper and the corresponding imple-mented system facilitate combining more sources of financial data into comprehensiveintegral dataset. The use of the dataset is then illustrated by analyzing the candidatemeasures by estimating parameters of regression on different returns and other financialindicators that can be defined using system’s novel data transformation approach.
1 Introduction
Different data mining methods for financial texts are described in the literature,and systems that apply these particular methods have been implemented. For anevaluation of the proposed methods authors use different approaches, some of themincorporating assumptions that do not conform to real financial markets’ conditions.This paper presents the current status of a project attempting to offer a possibilityto compare performance of these systems by offering a framework, a base dataset,and an implementation of the system according to design science guidelines fromHevner et al. (2004).
The research is a part of the project FINDS (Financial News and Data Services)initiated within Information Management and Market Engineering Graduate Schoolat the Karlsruhe Institute of Technology. Project FINDS has the goal to conductinnovative research on the analysis of quantitative and qualitative informationrelated to financial markets and to provide services that can help both researchersin financial field, and also professional traders (Bozic, 2009).
The paper is organized as follows: in section 2 we give an overview of the existingliterature in the field. Section 3 describes data sources for the comprehensiveresearch set, preprocessing methods used on data and gets the reader acquaintedwith the dataset by presenting some descriptive statistics. Section 4 illustrates theusage of the framework and the system by comparing four sentiment measures.Section 5 concludes the paper and describes the ideas for future research.
22 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof Weinhardt
2 Related Work
Many researchers have studied how published news influence market reactions.The methodologies used range from a fairly simple content analysis using classicalstatistical tools, to complex machine-learning methods. The approaches vary froman engineering approach which focuses on implementation and proving economicrelevance, to chiefly theoretical approaches whose goal is to describe underlyingeconomic phenomena.
Mittermayer and Knolmayer (2006a) compare eight text-mining systems, includ-ing their own. Since more technical performance criteria are often missing, it isnot possible to draw clear conclusions about relative performance. Wuthrich et al.(1998) classify news articles published overnight on web portals into three categoriesdepending on their influence on the one of five equity indices: Dow Jones, Nikkei,FTSE, Hang Seng, and Straits Times. With this system they attempt to forecast thetrend of the index daily value one day ahead. They use Naıve Bayes, Nearest Neigh-bour and Neural Net classifier, and a hand-crafted underlying dictionary. Lavrenkoet al. (2000) use Naıve Bayes classifier to classify news articles from Yahoo!Financeinto five groups, according to the influence on particular U.S. stocks. The featureswere determined automatically and the forecast horizon was from five to ten hours.Gidofalvi and Elkan (2003) use again naıve Bayes classifier with three categories torecognize articles which have bigger positive or negative influence on constituentsof Dow Jones index. With features defined using mutual information measurethey work on ten minutes aggregated intraday data. Fung, Yu, and Lam (2003)partially use commercially available text mining systems to predict a price trend forintraday market movements of some of the stocks listed on the Hong Kong StockExchange. For classification purposes they use support vector machines. Finally,Mittermayer and Knolmayer (2006b) propose a high frequency forecast system thatclassifies press releases of publicly traded companies in the U.S. using a dictionarythat combines automatically selected features and a hand-crafted thesaurus. Forclassification the authors use the polynomial version of SVM.
While most of the works focus on predicting price trends of single stock or index,there are works that aim at determining influence of news releases to volatility.Thomas (2003) improves risk-return profile by exiting the market in case of newsthat are predicting high volatility, while Schulz, Spiliopoulou, and Winkler (2003)attempt to classify press releases of German public companies according theirinfluence on volatility of stock prices.
Another group of publications not included in the survey by Mittermayer andKnolmayer (2006a) contains works that do not primary attempt to prove economicalrelevance of published text by evaluating specifically tailored trading strategies,but rather to find statistically relevant relations between financial indicators andsentiment extracted from the text.
Antweiler and Frank (2004) use Naıve Bayes and SVM classifiers to classifymessages posted to Yahoo!Finance and Raging Bull and determine their sentiment.
Towards a Benchmarking Framework for Financial Text Mining 23
They do not find statistically significant correlation with stock prices, but they findsentiment and volume of messages significantly correlated to trade volumes andvolatility. In their methodological paper Das and Chen (2007) offer a variety ofclassifiers, as well as composed sentiment measure as a result of voting amongclassifiers. In the illustrative example they analyze Yahoo stock boards and stockprices of 8 technology companies, but they do not find clear evidence that thesentiment index can be predictive for stock prices.
There are two pivotal articles published in the Journal of Finance. Tetlock (2007)observes Wall Street Journal’s column ”Abreast of the Market”, uses content analysissoftware General Inquirer together with Principal Component Analysis approachand finds that high pessimism in published media predicts downward pressure onmarket prices. Authors of Tetlock, Saar-Tsechansky, and Macskassy (2008) succeededto find that rate of negative words in news stories about certain company predictslow earnings of the company.
If we observe text mining methodologies as a transformations that assign numer-ical value to every textual string, we can refer to that numerical value as sentimentindex. All publications from the former group have at least implicit statementsabout the predictive power of the specific sentiment index on e.g. returns or volatil-ity. Following the evaluation approach from the latter group of publication, we aimat providing financial text mining research community with a framework and a toolthat can be used for proving their statements using statistical significance criteria.
3 Data
3.1 Data Sources
We use Thomson Reuters TickHistory data and the output from the Reuters NewsS-cope Sentiment Engine as a source dataset. These data sources are convenientbecause they provide access to trading data over period of more than 10 years, andextensive amount of sentiment data related to financial news stories.
Data constituting the output of the Reuters NewsScope Sentiment Engine rep-resents the author’s sentiment measure for every English-language news itempublished via NewsScope in years 2003-2008. The measure classifies a news iteminto one of three categories: positive, negative, or neutral. The probability of thenews item falling into each of the categories is also given.
Thomson Reuters TickHistory data is available through the DataScope platform.We use daily data for the period equivalent to the NewsScope Sentiment Enginedataset. This provides us with data on opening and closing prices for the particularproduct, bid and ask, as well as volume data. Other types of data about companiescoming from this source are the total amount of shares, used for calculating marketcapitalization, and paid dividends, that can be used for adjusting the returns.Additionally, it is the source of the data about daily values of the MSCI Indices forindividual countries, as well as MCSI World Index.
24 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof Weinhardt
Field Name Description Sourceric Reuters Instrument Code THd Date THOpen Opening daily price THHigh Maximal price within the day THLow Minimal price within the day THvol Daily trading volume THLast Closing daily price THBid Average bid THAsk Average ask THspread Average spread THcc Close to close daily return of the equity Doc Open to close daily return of the equity Doo Open to open daily return of the equity Dco Close to open daily return of the equity Dcnt Number of news items mentioning company RNSEpnaccnt Number of news stories mentioning company RNSEnsent pos Number of positive news items mentioning company RNSEnsent neut Number of neutral news items mentioning company RNSEnsent neg Number of negative news items mentioning company RNSEavgsent Average sentiment RNSEasent pos Average probability that news items are positive RNSEasent neut Average probability that news items are neutral RNSEasent neg Average probability that news items are negative RNSEnet sent Net sentiment: asent pos - asent neg Dnet sent std Standard deviation of net sentiment Dric market Abbreviation of company’s home market THcountry Country Dicc Close to close daily return of the country index Dioc Open to close daily return of the country index Dioo Open to open daily return of the country index Dico Close to open daily return of the country index Dcnt country Number of news items related to country RNSEavgsent item Average sentiment (country level) RNSEeccc Excess daily close to close return w/r to country index Decoc Excess daily open to close return w/r to country index Decco Excess daily close to open return w/r to country index Decoo Excess daily open to open return w/r to country index D
Table 1: Main fields and sources (TH - TickHistory,RNSE - Sentiment Engine, D - derived)
Towards a Benchmarking Framework for Financial Text Mining 25
Reuters NewsScope Sentiment Engine data has two main properties - the times-tamp of the news item publishing, and the related company mentioned in the newsitem. It is preprocessed in a way that the records are aggregated on the level of eachcompany, and also calendar day - according to local time in force at the location ofcompany’s home market. In that way we get an average sentiment for a companyfor each day in 6 years period. The sentiments are expressed by different calculatedvalues. The first aggregated sentiment measure is average sentiment class - wherethe classes are represented by values 1, 0, and -1 for positive, neutral, and negativeclass, respectively. Further measures are average probabilities that each news itemfalls into positive, negative or neutral class. As the Reuters sentiment index usedfor the evaluation later we adopted a deducted value defined as average probabilityof positive class minus average probability of negative class within one day.
From the data on daily prices we derive data on different returns: namelyclose-to-open, close-to-close, open-to-open, and open-to close returns. This is doneconsidering only days with trades for particular equity or index. The generic way ofdefining calculated fields derived from source data is described in later section. Thecalculated returns are adjusted for paid dividends by increasing the price of shareon the trading day following the dividend payment by the amount of dividendspaid per share.
3.2 Descriptive Statistics
Reuters NewsScope Sentiment Engine data consists of author’s sentiment measuresof English-language news items published through Reuters NewsScope in theperiod from 2003 to 2006 inclusive. Each record represents a unique mention ofthe specific company, with a possibility of one news item relating to more thanone company. In our dataset there are 6,127,190 records about 10,665 differentcompanies. Figure 1 shows the top twenty companies with greatest number ofrecords related to that company. According to Reuters news production process,several news items can make one single news story, e.g. short alert item is publishedimmediately, and after some time extension of the same story is published as a newitem, or in the case of corrections. In the available data average number of newsitems per story is 1.995, saying that in average two items make one news story.
Increase in data volume over the years is obvious and yearly record numberdoubles over the period of 6 years, as shown on Figure 2. Besides, on the samefigure, one can follow increase in the partial volume of records related to differenthome markets. The information about a company’s home market is an integralpart of the Reuters code uniquely identifying each instrument. Figure 2 separatelyshows data about the ten markets with greatest changes over the years. It can beseen that the fraction of the two biggest US markets, NYSE and NASDAQ, in thetotal data volume has grown from something over 40% to little less than 60% in this6 years period.
Each news item can have multiple tags called topic codes, and topic codes aregrouped into categories. One of the categories represents countries, and can be used
26 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof Weinhardt
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
GE
NE
RA
L M
OT
OR
S
CIT
IGR
OU
P
FO
RD
MO
TO
R C
O
GE
NE
RA
L E
LE
C C
O
EX
XO
N M
OB
IL
JPM
OR
GA
N C
HA
SE
BP
MIC
RO
SO
FT
CP
BO
EIN
G C
O
GO
LD
M S
AC
HS
GR
P
FR
ED
DIE
MA
C
SO
NY
CO
RP
MA
RT
ST
OR
ES
ME
RR
ILL L
YN
CH
BA
NK
OF
AM
ER
ICA
FA
NN
IE M
AE
EA
DS
TO
YO
TA
MO
TO
R C
O
DE
UT
SC
HE
BA
NK
N
HS
BC
HO
LD
ING
S
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
GE
NE
RA
L M
OT
OR
S
CIT
IGR
OU
P
FO
RD
MO
TO
R C
O
GE
NE
RA
L E
LE
C C
O
EX
XO
N M
OB
IL
JPM
OR
GA
N C
HA
SE
BP
MIC
RO
SO
FT
CP
BO
EIN
G C
O
GO
LD
M S
AC
HS
GR
P
FR
ED
DIE
MA
C
SO
NY
CO
RP
WA
L-M
AR
T S
TO
RE
S
ME
RR
ILL L
YN
CH
BA
NK
OF
AM
ER
ICA
FA
NN
IE M
AE
EA
DS
TO
YO
TA
MO
TO
R C
O
DE
UT
SC
HE
BA
NK
N
HS
BC
HO
LD
ING
S
Figure 1: Top twenty companies by number of records
400000
600000
800000
1000000
1200000
1400000
1600000 rest
T
VX
MI
AX
PA
TO
DE
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
2003 2004 2005 2006 2007 2008
rest
T
VX
MI
AX
PA
TO
DE
L
O
N
Figure 2: Data volume change over years
for determining what country is mentioned in the particular news item. Using thisauthor tagging feature, average sentiment per country can be calculated. The Figure3 is showing counties with greatest and lowest average sentiment, when countrytags with less than 1000 mentions are excluded.
During the week, only 2.19% of all news items are published on weekends.Figure 4 shows the distribution of records per days of week. Our dataset comprisesof data spanning 2192 days in total, considering GMT time zone, and there are even13 companies that are mentioned at least once in more than 80% of all days, asshown in Table 2. One more interesting property of the data is noticeable sentimentdecrease for news items published on Saturdays, as shown on Figure 5.
Towards a Benchmarking Framework for Financial Text Mining 27
Figure 3: Best and worst average sentiment for countries with over 1000 mentions
Company Days with news PercentGENERAL ELEC CO 1911 87.18SONY CORP 1891 86.27CITIGROUP 1863 84.99BP 1851 84.44WALT DISNEY CO 1841 83.99GENERAL MOTORS 1837 83.8EXXON MOBIL 1822 83.12WAL-MART STORES 1802 82.21BOEING CO 1798 82.03MICROSOFT CP 1792 81.75HSBC HOLDINGS 1791 81.71GOLDM SACHS GRP 1782 81.3FORD MOTOR CO 1754 80.02EADS 1752 79.93TOYOTA MOTOR CO 1729 78.88DEUTSCHE BANK N 1729 78.88VOLKSWAGEN AG 1720 78.47VODAFONE GROUP 1711 78.06CHEVRON 1704 77.74TOTAL FINA 1693 77.24Table 2: The 20 companies mentioned on most days
28 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof Weinhardt
400000
600000
800000
1000000
1200000
1400000
1600000
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
MON TUE WED THU FRI SAT SUN
Figure 4: Number of records per days of week
0
0.02
0.04
0.06
0.08
0.1
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
MON TUE WED THU FRI SAT SUN
Figure 5: Average sentiment per days of week
3.3 Preprocessing
To enable easy extending of the existing dataset, generic interfacing is one of thesystem’s features. It is generic in that aspect that it allows any format of theDataScope platform output to be loaded into the system automatically. New sourceto be added is placed in a comma separated values (CSV) file. The procedure ofgeneric load then starts, making new data source available for the next steps ofpreprocessing. The names of the new columns that can be later used for referencingare extracted from header row of the comma separated input file. The necessarytype transformations on the added data are conducted by means of metadatadictionary, which holds datatypes and formats related to the column names. In thisway new data is transformed into the representation that can be used as input ofthe successive steps.
Towards a Benchmarking Framework for Financial Text Mining 29
As our goal is to explore dependencies of various indicators and sentimentmeasures, another feature of the system is a novel approach to defining calculatedindicators. Any of the columns from the source datasets can be used as a basefor calculations. Besides full range of arithmetic operators, lagging operators canbe defined and used as well. The definition of the calculated fields is given inthe notation of simple grammar we defined. The definition is saved in controlfile, which is parsed and used for code generation. It produces PL/SQL code thatapplied to the source datasets produces additional datasets containing calculatedfields.
Final result is a clean dataset with derived fields that were not part of the sourcedatasets, designed for particular benchmarking purpose. The overview of the dataflow can be seen on Figure 6.Data Processing
10
Thomson ReutersTickHistory
Reuters NewsScopeSentiment Engine
Calculation &Aggregationaccording to
formal definition
Grammar AggregationTransformation
Clean Data
Benchmarking
Caslav Bozic - Towards Benchmarking Framework for Financial Text Mining
Figure 6: Overview of the Preprocessing system
30 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof Weinhardt
4 First Results
As an illustration of framework application we performed comparison of foursentiment measures. Three of them are produced using classifiers that are imple-mented in the scope of FINDS Project. The first one is using Bayes-Fisher filter,second Support Vector Machines, while third implements artificial Neural Networksmethodology. The classifiers are trained on high frequency intraday data on trades,aggregated to 1 minute average prices. The news items are grouped into classesdepending on their influence on the price movements. Observing the period oftwo hours immediately following news item publication, we determine class of thenews item. If the price stabilized on the level higher than the level at the momentof news item publication, news item was classified as positive. If the price stabi-lized on lower level, news item was classified as negative. If the price stabilizationcouldn’t be observed, news item is considered neutral. More details about classifierimplementation can be found in Pfrommer et al. (2010). Training period was firstnine months of 2003, and news stories about five big technology companies (SAP,Oracle, Microsoft, IBM, and Apple) were used.
The last three months of 2003 represent the evaluation period, with total of 729news stories analyzed. The process of news transformation shown on Figure 7consists of text tokenizing, stemming, and finally text classification using selectedmethodology. Using the dataset and the framework we calculated parameters forregression of lagged dependent variables against each sentiment measure. Thevariables were four different daily returns, four daily excess returns, and averagedaily spread, all lagged up to ten days. The regression formula is given by idt =
α(id,i) + β(id,i)St−i + εt where id represents dependent variable, and i size of thelag. The results are presented in Table 3 and Table 4. The column title determinesdependent variable, and the row represents days of lag between sentiment measureS and dependent variable id. Each cell shows two values: estimated coefficientvalue β first, and then p value according to t-statistics.
The results show that just for 14 variables could be proven statistically relevantdependence on lagged sentiment values. Although RNSE data shows most of thestatistically significant results, seven, and all of them have positive coefficients, it isnot possible to give clear statement about comparative performance of sentimentmeasures. This issue possibly arises from the low data volume, causing the statisticaldependence not to be observed. To overcome the problem, regression parameterestimation for one of the measures is repeated with greater number of data points.RNSE measure is chosen because of the extensive volume of the data. Table 5presents the results from regression parameters estimation on data sample includingall six years of processed news stories in period 2003 to 2008 for two major U.S.markets contained in the system. All tested dependent variables show statisticallysignificant dependence on sentiment value with lag of at least one day. The increasein number of statistically relevant results shows that greater amount of data offers apossibility to draw more confident conclusions.
Towards a Benchmarking Framework for Financial Text Mining 31
Bay
es-F
ish
er
la
g cc
oo
co
oc
sp
read
eccc
ecoo
ecco
ecoc
1 -0
.008
24
0.11
-0
.003
16
0.54
-0
.006
41
0.15
-0
.001
82
0.48
0.
0001
3175
0.
45
-0.0
0646
0.
23
-0.0
0439
0.
40
-0.0
0386
0.
43
-0.0
026
0.24
2
-0.0
0212
0.
72
-0.0
0557
0.
19
-0.0
0046
567
0.93
-0
.001
65
0.54
-0
.000
1933
70.
18
-0.0
039
0.53
-0
.002
63
0.47
-0
.002
52
0.65
-0
.001
47
0.54
3
-0.0
0056
663
0.91
-0
.004
66
0.29
-0
.002
43
0.54
0.
0018
7 0.
49
-0.0
0063
3 0.
01
-0.0
028
0.53
-0
.000
9659
20.
79
-0.0
0301
0.
44
0.00
0298
81
0.90
4
-0.0
0646
0.
15
0.00
26
0.51
-0
.006
67
0.10
0.
0002
0725
0.
93
-0.0
0049
77
0.07
-0
.003
57
0.40
0.
0008
996
0.77
-0
.003
3 0.
40
-0.0
0020
749
0.91
5
-0.0
0032
704
0.94
-0
.000
1323
2 0.
97
0.00
193
0.61
-0
.002
25
0.32
-0
.000
1992
80.
30
0.00
223
0.60
-0
.001
48
0.64
0.
0024
2 0.
51
-0.0
0016
307
0.93
6
0.00
34
0.42
-0
.001
63
0.67
0.
0026
0.
43
0.00
0797
63
0.73
-0
.000
5966
0.
04
0.00
0362
64
0.92
-0
.000
8124
10.
83
0.00
0379
01
0.89
-0
.000
0045
5 1.
00
7 0.
0013
9 0.
75
0.00
642
0.12
-0
.002
34
0.47
0.
0037
3 0.
17
-0.0
0034
695
0.26
0.
0032
6 0.
41
0.00
584
0.17
-0
.000
9226
20.
73
0.00
424
0.11
8
0.00
172
0.69
-0
.000
9565
8 0.
81
0.00
291
0.36
-0
.001
19
0.65
-0
.000
121
0.54
0.
0016
5 0.
68
0.00
262
0.50
0.
0010
6 0.
70
0.00
0620
9 0.
80
9 -0
.003
43
0.27
-0
.005
7 0.
11
0.00
0230
65
0.93
-0
.003
66
0.08
-0
.000
4424
0.
01
-0.0
0181
0.
53
-0.0
0642
0.
09
0.00
206
0.42
-0
.003
85
0.05
10
-0
.004
11
0.19
-0
.003
8 0.
25
-0.0
0204
0.
49
-0.0
0207
0.
21
-0.0
0007
904
0.64
-0
.002
31
0.41
-0
.006
19
0.05
-0
.000
1007
40.
97
-0.0
0218
0.
12
S
V
M
lag
cc
oo
co
oc
spre
ad
ec
cc
ec
oo
ec
co
ec
oc
1
0.00
102
0.83
-0
.003
39
0.48
-0
.001
01
0.81
0.
0020
3 0.
40
-0.0
0003
017
0.76
-0
.001
04
0.85
0.
0012
7 0.
81
-0.0
027
0.59
0.
0016
9 0.
46
2 -0
.003
3 0.
56
-0.0
0377
0.
36
-0.0
0499
0.
31
0.00
169
0.52
-0
.000
0437
60.
65
-0.0
0535
0.
38
0.00
371
0.30
-0
.006
37
0.24
0.
0010
5 0.
65
3 -0
.003
25
0.48
-0
.003
13
0.46
-0
.005
68
0.14
0.
0024
3 0.
36
-0.0
0009
86
0.59
-0
.003
53
0.43
-0
.001
49
0.68
-0
.004
78
0.21
0.
0012
8 0.
58
4 -0
.002
81
0.54
-0
.001
86
0.64
-0
.004
5 0.
28
0.00
169
0.45
-0
.000
0466
70.
83
-0.0
0236
0.
58
-0.0
0109
0.
73
-0.0
0413
0.
30
0.00
179
0.34
5
-0.0
0022
28
0.96
-0
.000
5497
9 0.
89
-0.0
0288
0.
46
0.00
266
0.25
0.
0000
0991
0.
96
0.00
0273
64
0.95
0.
0011
9 0.
72
-0.0
0215
0.
56
0.00
239
0.22
6
0.00
218
0.62
0.
0015
6 0.
69
-0.0
0253
0.
47
0.00
471
0.05
-0
.000
2602
20.
41
0.00
148
0.72
0.
0039
0.
35
-0.0
0121
0.
70
0.00
27
0.28
7
-0.0
0014
802
0.97
0.
0021
0.
63
-0.0
0358
0.
29
0.00
344
0.22
-0
.000
0072
10.
99
0.00
29
0.48
0.
0049
2 0.
26
-0.0
0128
0.
66
0.00
416
0.13
8
-0.0
0127
0.
78
-0.0
0473
0.
25
-0.0
0147
0.
66
0.00
0199
47
0.94
0.
0004
5594
0.
13
0.00
114
0.80
-0
.004
54
0.29
0.
0004
2362
0.
89
0.00
0720
89
0.79
9
-0.0
0316
0.
33
-0.0
0257
0.
49
-0.0
0493
0.
09
0.00
177
0.42
-0
.000
1396
70.
57
-0.0
0421
0.
16
-0.0
0391
0.
31
-0.0
043
0.10
0.
0000
8207
0.
97
10
-0.0
0112
0.
73
0.00
43
0.21
-0
.004
34
0.16
0.
0032
2 0.
06
0.00
0465
03
0.13
-0
.000
3814
0.
90
0.00
532
0.12
-0
.003
34
0.23
0.
0029
7 0.
05
Tab
le
3:
Para
met
er
estim
ate
and
p
valu
e fo
r la
gged
se
ntim
ent
mea
sure
s pro
duce
d
by
Bay
es-F
isch
er
and
SVM
cl
assi
fier
sTa
ble
3:Pa
ram
eter
esti
mat
ean
dp
valu
efo
rla
gged
sent
imen
tm
easu
res
prod
uced
byBa
yes-
Fish
eran
dSV
Mcl
assi
fiers
32 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof Weinhardt N
N
lag
cc
oo
co
oc
spre
ad
ec
cc
ec
oo
ec
co
ec
oc
1
0.00
0891
98
0.85
-0
.000
0506
2 0.
99
-0.0
0268
0.
52
0.00
357
0.12
0.
0000
888
0.65
-0
.001
34
0.78
0.
0025
6 0.
59
-0.0
035
0.43
0.
0021
5 0.
28
2 -0
.000
1854
40.
97
0.00
534
0.16
-0
.002
72
0.56
0.
0025
3 0.
30
-0.0
0010
813
0.53
0.
0019
0.
73
0.00
566
0.08
-0
.000
4352
90.
93
0.00
238
0.26
3
0.00
143
0.74
-0
.003
12
0.44
0.
0027
1 0.
45
-0.0
0128
0.
60
0.00
0144
92
0.67
0.
0011
0.
79
-0.0
0034
131
0.92
0.
0017
9 0.
61
-0.0
0067
498
0.75
4
0.00
0272
33
0.95
0.
0019
7 0.
58
-0.0
0103
0.
78
0.00
13
0.52
-0
.000
3276
50.
32
0.00
419
0.28
0.
0014
2 0.
62
0.00
282
0.44
0.
0014
6 0.
39
5 0.
002
0.62
-0
.000
1865
4 0.
96
0.00
0402
67
0.91
0.
0015
9 0.
43
-0.0
0022
739
0.30
0.
0063
2 0.
10
0.00
273
0.35
0.
0029
2 0.
38
0.00
333
0.05
6
-0.0
0051
28
0.89
0.
0037
7 0.
27
-0.0
0178
0.
55
0.00
126
0.55
-0
.000
2800
30.
44
0.00
258
0.43
0.
0029
8 0.
37
0.00
0482
3 0.
85
0.00
216
0.27
7
0.00
208
0.61
-0
.002
18
0.56
0.
0016
2 0.
58
0.00
0455
96
0.85
-0
.000
2035
20.
65
0.00
236
0.50
-0
.000
8988
70.
81
0.00
0923
07
0.71
0.
0014
5 0.
53
8 0.
0019
8 0.
61
0.00
0620
15
0.86
-0
.001
63
0.57
0.
0036
1 0.
12
-0.0
0043
306
0.19
0.
0009
4201
0.
80
0.00
0632
96
0.86
-0
.002
07
0.43
0.
0030
4 0.
17
9 -0
.002
06
0.47
-0
.001
65
0.61
-0
.002
99
0.23
0.
0009
3048
0.
62
-0.0
0031
224
0.24
-0
.001
76
0.49
-0
.001
82
0.58
-0
.002
54
0.25
0.
0007
9024
0.
65
10
-0.0
0188
0.
51
-0.0
0003
904
0.99
-0
.002
58
0.33
0.
0006
9155
0.
64
-0.0
0042
125
0.21
-0
.002
63
0.31
-0
.001
52
0.61
-0
.002
35
0.33
-0
.000
2421
5 0.
85
R
NS
E D
a
ta
lag
cc
oo
co
oc
spre
ad
ec
cc
ec
oo
ec
co
ec
oc
1
0.00
364
0.22
0.
0007
7126
0.
79
0.00
31
0.24
0.
0005
3878
0.
73
0.00
0469
26
0.04
0.
0033
8 0.
24
0.00
0766
88
0.79
0.
0030
2 0.
25
0.00
0368
11
0.79
2
0.00
346
0.27
0.
0008
1511
0.
75
0.00
0017
46
0.99
0.
0034
5 0.
03
0.00
0056
51
0.71
0.
0037
6 0.
22
0.00
0471
86
0.84
0.
0012
5 0.
64
0.00
251
0.07
3
-0.0
0237
0.
37
0.00
0501
75
0.87
-0
.002
87
0.21
0.
0005
086
0.74
0.
0003
7629
0.
10
-0.0
0106
0.
66
0.00
0985
42
0.74
-0
.002
33
0.27
0.
0012
9 0.
35
4 -0
.001
94
0.52
0.
0020
9 0.
48
-0.0
0107
0.
70
-0.0
0087
09
0.55
0.
0001
7053
0.
44
-0.0
0167
0.
58
0.00
106
0.72
-0
.000
5078
30.
86
-0.0
0117
0.
39
5 0.
0070
3 0.
02
0.00
419
0.11
0.
0032
2 0.
24
0.00
382
0.01
-0
.000
0289
60.
87
0.00
491
0.11
0.
0044
4 0.
07
0.00
134
0.63
0.
0035
8 0.
01
6 0.
004
0.14
0.
0036
6 0.
20
-0.0
0012
613
0.96
0.
0041
3 0.
01
-0.0
0010
191
0.63
0.
0029
0.
23
0.00
449
0.13
-0
.001
13
0.58
0.
0040
2 <0
.01
7 0.
0015
5 0.
61
0.00
255
0.34
-0
.000
0374
90.
99
0.00
159
0.30
0.
0002
2742
0.
31
-0.0
0065
236
0.83
0.
0003
3338
0.
89
-0.0
019
0.48
0.
0012
7 0.
37
8 0.
0012
6 0.
65
0.00
0688
39
0.81
0.
0010
3 0.
65
0.00
0232
26
0.88
-0
.000
1254
20.
54
-0.0
0267
0.
30
-0.0
0051
524
0.86
-0
.002
02
0.35
-0
.000
6356
3 0.
65
9 0.
0025
0.
37
-0.0
0054
481
0.83
0.
0016
2 0.
52
0.00
0878
12
0.53
-0
.000
1835
20.
38
0.00
208
0.46
-0
.002
7 0.
27
0.00
147
0.57
0.
0005
8709
0.
65
10
0.00
102
0.70
0.
0035
5 0.
16
-0.0
0107
0.
64
0.00
209
0.13
0.
0000
6798
0.
71
-0.0
0010
018
0.97
0.
0012
4 0.
60
-0.0
0204
0.
35
0.00
194
0.11
Tab
le 4
: Pa
ram
eter
est
imat
e an
d p
val
ue
for
lagged
sen
tim
ent
mea
sure
s pro
duce
d b
y N
N c
lass
ifie
r an
d f
rom
RN
SE d
ata
Ta
ble
4:Pa
ram
eter
esti
mat
ean
dp
valu
efo
rla
gged
sent
imen
tm
easu
res
prod
uced
byN
Ncl
assi
fier
and
from
RN
SEda
ta
Towards a Benchmarking Framework for Financial Text Mining 33
RN
SE
Dat
a 20
03-2
008
lag
cc
oo
co
oc
spre
ad
ec
c
eoo
ec
o
eoc
1 0.
0019
2 *
0.00
422
* 0.
0006
9627
* 0.
0012
2 *
-0.0
0022
379
0.00
13
0.00
159
* 0.
0031
5 *
0.00
0492
24
* 0.
0010
4 *
2 0.
0007
9088
*
0.00
0947
52
* 0.
0005
3439
0.00
010.
0002
565
0.03
23-0
.000
1892
80.
0061
0.
0003
2129
0.
0376
0.00
0627
96
0.00
030.
0001
8202
0.
1377
0.00
0104
81
0.35
82
3 0.
0005
4479
0.
0023
0.00
0470
96
0.01
03
0.00
0591
87*
-0.0
0004
708
0.69
09-0
.000
1986
30.
0040
0.
0001
3246
0.
4290
0.00
0088
23
0.62
460.
0002
2776
0.
0621
-0.0
0011
461
0.38
05
4 0.
0006
4604
0.
0002
0.00
0634
94
0.00
05
0.00
0537
41*
0.00
0108
63
0.34
76-0
.000
2188
30.
0016
0.
0001
5744
0.
3067
0.00
0147
38
0.38
370.
0001
4664
0.
2254
-0.0
0002
051
0.85
86
5 0.
0002
2031
0.
2161
0.00
0299
27
0.10
03
0.00
0407
970.
0031
-0.0
0018
766
0.11
16-0
.000
2129
70.
0022
-0
.000
1218
50.
4556
-0.0
0019
69
0.26
300.
0001
2828
0.
2879
-0.0
0028
489
0.02
34
6 0.
0002
0479
0.
2540
0.00
0482
02
0.00
86
0.00
0105
780.
4423
0.00
0099
01
0.41
02-0
.000
2263
30.
0012
-0
.000
1574
0.
3664
0.00
0034
79
0.85
18-0
.000
1108
90.
3569
-0.0
0006
389
0.64
92
7 0.
0002
5166
0.
1574
0.00
0297
64
0.10
07
0.00
0076
580.
5771
0.00
0175
09
0.13
47-0
.000
2703
80.
0001
0.
0001
9783
0.
1838
-0.0
0008
992
0.58
090.
0000
9277
0.
4398
0.00
0060
7 0.
5710
8
0.00
0254
73
0.15
29-0
.000
0316
4 0.
8623
0.
0003
9427
0.00
42-0
.000
1395
40.
2397
-0.0
0025
098
0.00
03
0.00
0017
77
0.92
07-0
.000
1344
0.
4789
0.00
0227
77
0.05
96-0
.000
2376
1 0.
1003
9
0.00
0172
0.
3371
0.00
0407
52
0.02
63
0.00
0114
370.
4064
0.00
0057
63
0.63
07-0
.000
2348
0.
0006
0.
0000
3302
0.
8616
0.00
0152
86
0.43
490.
0000
3762
0.
7555
-0.0
0002
812
0.85
24
10
0.00
0379
59
0.03
290.
0001
7503
0.
3333
0.
0002
8868
0.03
580.
0000
9091
0.
4365
-0.0
0024
143
0.00
06
0.00
0012
63
0.92
89-0
.000
0140
90.
9276
-0.0
0000
507
0.96
66-0
.000
0181
2 0.
8507
11
0.
0003
2344
0.
0748
0.00
0254
58
0.16
83
0.00
0375
990.
0060
-0.0
0005
256
0.66
95-0
.000
2185
80.
0018
-0
.000
0369
70.
8519
-0.0
0038
506
0.05
780.
0003
613
0.00
26-0
.000
4045
9 0.
0120
12
0.
0002
2131
0.
2169
0.00
0446
02
0.01
47
0.00
0180
890.
1862
0.00
0040
42
0.73
73-0
.000
2448
90.
0005
-0
.000
1286
10.
4582
0.00
0151
26
0.41
260.
0000
7943
0.
5070
-0.0
0024
499
0.07
97
13
0.00
0414
67
0.01
870.
0002
5657
0.
1517
0.
0003
6354
0.00
760.
0000
5112
0.
6588
-0.0
0024
455
0.00
06
0.00
0107
97
0.47
860.
0000
1232
0.
9405
0.00
0164
77
0.16
67-0
.000
0751
8 0.
5089
14
0.
0002
2718
0.
2052
0.00
0505
54
0.00
55
0.00
0136
180.
3213
0.00
0091
01
0.45
04-0
.000
2542
50.
0003
0.
0000
9126
0.
5439
0.00
0148
53
0.36
070.
0000
9409
0.
4330
-0.0
0004
179
0.70
59
15
0.00
0038
22
0.82
86-0
.000
0029
5 0.
9869
0.
0002
1168
0.12
18-0
.000
1734
60.
1348
-0.0
0021
112
0.00
27
-0.0
0009
376
0.50
45-0
.000
1868
20.
2264
0.00
0176
99
0.13
95-0
.000
2966
9 0.
0023
16
0.
0004
305
0.01
700.
0002
9846
0.
1048
0.
0003
6374
0.00
810.
0000
6676
0.
5825
-0.0
0021
967
0.00
17
0.00
0108
26
0.51
070.
0001
3612
0.
4411
0.00
0122
93
0.30
58-0
.000
0392
2 0.
7607
17
0.
0002
8462
0.
1203
0.00
0355
85
0.05
68
0.00
0253
240.
0647
0.00
0031
38
0.80
26-0
.000
2281
60.
0012
0.
0002
1946
0.
2327
0.00
0020
64
0.91
530.
0002
9143
0.
0153
-0.0
0009
058
0.55
10
18
0.00
0276
66
0.12
360.
0002
8981
0.
1121
0.
0002
1893
0.11
090.
0000
5773
0.
6304
-0.0
0023
1 0.
0008
0.
0000
9887
0.
5301
0.00
0228
95
0.17
900.
0001
3912
0.
2442
-0.0
0005
783
0.63
31
19
0.00
0161
56
0.36
940.
0001
5372
0.
4024
0.
0002
2874
0.09
29-0
.000
0671
70.
5804
-0.0
0018
249
0.00
84
-0.0
0007
894
0.61
60-0
.000
0632
60.
7096
0.00
0088
99
0.45
79-0
.000
1752
4 0.
1478
20
0.
0004
1741
0.
0200
0.00
0284
56
0.11
62
0.00
0367
310.
0077
0.00
0050
11
0.67
39-0
.000
1935
0.
0054
0.
0000
0844
0.
9614
-0.0
0001
209
0.94
650.
0000
9293
0.
4407
-0.0
0009
454
0.47
53
T abl
e5:
Para
met
eres
tim
ate
and
pva
lue
for
lagg
edse
ntim
ent
mea
sure
sfr
omR
NSE
data
inpe
riod
2003
to20
08(*
pva
lue
¡0.0
001)
34 Caslav Bozic, Ryan Riordan, Detlef Seese, and Christof WeinhardtFirst Results
12 Caslav Bozic - Towards Benchmarking Framework for Financial Text Mining
Tokenizer
Stemmer
Classification
Reuters TakesNews Stories
Full Text of News Stories
3 classifiersBayes – FisherSVMNeural Network
Figure 7: Transformations of news data
5 Conclusion
In an attempt to define a benchmark for performance of sentiment measures forfinancial texts, we propose the framework and present implementation of the systemthat enables easy deriving of financial indicators to be used in performance measure.As an illustration we offer a use-case analysis of four different approaches, althoughnot giving any general conclusions about adequacy of particular approaches. Itis shown that due to the inadequate data volume conclusions are vague, but theincrease in the volume of analyzed data offers us a possibility to draw more robustconclusions.
Some of the future directions of the research would be extending the source base,including Compustat data on market capitalization and earnings of companies, andalso extending calculated dataset, to offer wider variety of indicators which couldbe compared against candidate sentiment measure. More sophisticated statisticalanalysis should be performed on the final dataset to be able to give more decisivestatements about general adequacy of text mining approaches in financial news.These could include using panel regression to control for inter-firm dependenciesand applying significance test that would account for possible heteroscedasticity inthe data.
Towards a Benchmarking Framework for Financial Text Mining 35
References
Antweiler, W. and M. Z. Frank (2004): “Is All That Talk Just Noise? The InformationContent of InternetStock Message Boards,” The Journal of Finance, 59(3), pp. 1259–1294.
Bozic, C. (2009): “FINDS - Integrative services,” in: Computer Systems and Applications,2009. AICCSA 2009. IEEE/ACS International Conference on, pp. 61–62.
Das, S. and M. Chen (2007): “Yahoo! for Amazon: Sentiment extraction from smalltalk on the web,” Management Science, 53(9), pp. 1375–1388.
Fung, G. P. C., J. X. Yu, and W. Lam (2003): “Stock prediction: Integrating textmining approach using real-time news,” in: Computational Intelligence for FinancialEngineering, 2003. Proceedings. 2003 IEEE International Conference on, pp. 395 – 402.
Gidofalvi, G. and C. Elkan (2003): “Using news articles to predict stock price move-ments,” Department of Computer Science and Engineering, University of California,San Diego.
Hevner, A. R., S. T. March, J. Park, and S. Ram (2004): “Design Science in InformationSystems Research,” MIS Quarterly, 28(1), pp. 75–105.
Lavrenko, V., M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan (2000):“Mining of Concurrent Text and Time-Series,” in: Sixth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining.
Mittermayer, M. and G. Knolmayer (2006a): “Text mining systems for marketresponse to news: A survey,” Institute of Informations Systems, University of Bern.
Mittermayer, M.-A. and G. F. Knolmayer (2006b): “NewsCATS: A News Categoriza-tion and Trading System,” Data Mining, IEEE International Conference on, 0, pp.1002–1007.
Pfrommer, J., C. Hubschneider, and S. Wenzel (2010): “Sentiment Analysis on StockNews using Historical Data and Machine Learning Algorithms,” Term Paper,Karlsruhe Institute of Technology (KIT).
Schulz, A., M. Spiliopoulou, and K. Winkler (2003): “Kursrelevanzprognose vonAd-hoc-Meldungen: Text Mining wider die Informationsuberlastung im MobileBanking,” Wirtschaftsinformatik, 2, pp. 181–200.
Tetlock, P. (2007): “Giving Content to Investor Sentiment: The Role of Media in theStock Market,” Journal of Finance, 62(3).
Tetlock, P., M. Saar-Tsechansky, and S. Macskassy (2008): “More Than Words:Quantifying Language to Measure Firms’ Fundamentals,” Journal of Finance, 63(3),pp. 1437–1467.
Thomas, J. (2003): News and trading rules, Ph.D. thesis, Carnegie Mellon University.Wuthrich, B., D. Permunetilleke, S. Leung, V. Cho, J. Zhang, and W. Lam (1998):
“Daily prediction of major stock indices from textual www data,” 1998 IEEEInternational Conference on Systems, Man, and Cybernetics.