12
Akhil Jindal, Karan Singla, Kinnar Kumar Sen, Kulvinder Bisla SIEL Labs International Institute of Information Technology, Hyderabad An empirical approach to predict Stock Market using financial news articles

IRE major project group 22 IIITH

Embed Size (px)

DESCRIPTION

Over the years there have been a lot of attempts to predict stock market movements using various techniques and hundreds of parameters. Some of the algorithms used are Exponential Moving Average and Head & Shoulders. Artificial Neural Networks and Genetic Algorithms are also used heavily. Many analysts use more traditional techniques such as P/E Ratio too. All these techniques used Stock Market prices, stock volumes traded and dividends paid etc. However there has been no single solution which has been perfected, generally an ensemble of algorithms are used for this purpose. However our attempt was to highlight how market and news/blogs sentiment can be harnessed and used for predicting Stock Movements without these traditional techniques

Citation preview

Page 1: IRE major project group 22 IIITH

Akhil Jindal, Karan Singla, Kinnar Kumar Sen, Kulvinder Bisla

SIEL Labs

International Institute of Information Technology, Hyderabad

An empirical approach to predict Stock Market using financial news articles

Page 2: IRE major project group 22 IIITH

2

What ? – Introduction

Why ? – Uses

How ? – Steps

Experiments

Results

Conclusion

References

Agenda

Page 3: IRE major project group 22 IIITH

3

Prediction of Stock Market movements and approx. percentage changes without use of traditional stock algorithms.Using the cauldron of emotions, thought processes and sentiments expressed in news articles and blogs related to financial news.

Multiple techniques of IRE used.

Statistical Models derived and experimented with.

Exciting results

What? - Introduction

Page 4: IRE major project group 22 IIITH

4

Attempt to find out

• whether sentiments have direct relationship with Stock Market movements

• techniques for disambiguation of entities extracted from free text

Real World Uses

• Guide a stock market trader/investor to gauge the market approximately as expressed by co-traders, journalists , bloggers etc.

Why ? - Uses

Page 5: IRE major project group 22 IIITH

5

How ?

Google News

Alerts (For Stock

Market News).

Parse news

articles / blogs etc.

Extract entities using

entity extraction

libraries from GATE.

Calculate multiple

sentiments using entity position tags and various

keywords such as Stock,

English and Contextual

etc.

Mine sentiment of

the news article as a whole using NLP libraries

from Stanford.

Collect Training datasets

Create models using WEKA

( 10 Fold Cross

Validation )

Page 6: IRE major project group 22 IIITH

6

How ?

UBUNTU/LINUX

PENTAHO

JAVA

WEKA GATE

ETL MACHINE LEARNING PREDICTIVE ANALYSIS ENTITY EXTRACTION

MySQL

NEWS FEED

ETLD

B

MACHINE LEARNING

PREDICTIVE ANALYSIS

STOCK MARKET PREDICTIONS

Training Date Range

1/1/2013 to 31/12/2013

No. of news articles

~35000 

No. of unique company/stock code extracted from news/blog articles

~1400

No. of Stock Keywords

~2156

No. of English Keywords

~2838

No. of records used for building model

~7000

Page 7: IRE major project group 22 IIITH

7

Experiments (Classification)

=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 4535 59.6397 %Incorrectly Classified Instances 3069 40.3603 %Kappa statistic 0.0271Mean absolute error 0.4853Root mean squared error 0.499 Relative absolute error 103.6642 %Root relative squared error 103.1445 %Total Number of Instances 7604

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.846 0.822 0.633 0.846 0.724 0.531 DOWN 0.178 0.154 0.408 0.178 0.248 0.531 UPWeighted Avg. 0.596 0.572 0.549 0.596 0.546 0.531

=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 4623 60.7969 %Incorrectly Classified Instances 2981 39.2031 %Kappa statistic 0.1212Mean absolute error 0.4342Root mean squared error 0.5242Relative absolute error 92.753 %Root relative squared error 108.3439 %Total Number of Instances 7604

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.757 0.641 0.664 0.757 0.707 0.578 DOWN 0.359 0.243 0.468 0.359 0.406 0.578 UPWeighted Avg. 0.608 0.492 0.591 0.608 0.595 0.578

NATIVE BAYES RANDOM FOREST

Page 8: IRE major project group 22 IIITH

8

=== Cross-validation ===

=== Summary ===

 Correlation coefficient 0.0666

Mean absolute error 2.9873

Root mean squared error 6.4104

Relative absolute error 99.4892 %

Root relative squared error 99.8253 %

Total Number of Instances 7604

Experiments (Regression)

SMOreg

Page 9: IRE major project group 22 IIITH

9

Results

NATIVE BAYES RANDOM FOREST

Classification of Stock Movements

Page 10: IRE major project group 22 IIITH

10

Results Prediction of Stock Percentages (SMOreg)APPLE STOCK INDEX MICROSOFT STOCK

INDEX

Page 11: IRE major project group 22 IIITH

11

The experiment was successful to a large extent for classifying the stock movements over a period of time for both the classification models

The accuracy percentage achieved for classification was about 80% which is significant.

However a lot challenges remain for predicting the percentage changes just by using stock news. This maybe due the lack of following factors• Not using any stochastic parameters in the model• Disambiguation of Stock Code entities • Classification of Stock Companies in groups• The prediction is not Real Time ‘as and when’ news is published hence it may introduce a lot of noise.

• There are a lot of uncertainties involved in predicting stock indices some of them are not addressable by sentiments only

Conclusions

Page 12: IRE major project group 22 IIITH

12

https://gate.ac.uk/

http://www-nlp.stanford.edu/

http://weka.wikispaces.com/

http://www.pentaho.com/

References