Upload
akhil-jindal
View
92
Download
2
Embed Size (px)
DESCRIPTION
Over the years there have been a lot of attempts to predict stock market movements using various techniques and hundreds of parameters. Some of the algorithms used are Exponential Moving Average and Head & Shoulders. Artificial Neural Networks and Genetic Algorithms are also used heavily. Many analysts use more traditional techniques such as P/E Ratio too. All these techniques used Stock Market prices, stock volumes traded and dividends paid etc. However there has been no single solution which has been perfected, generally an ensemble of algorithms are used for this purpose. However our attempt was to highlight how market and news/blogs sentiment can be harnessed and used for predicting Stock Movements without these traditional techniques
Citation preview
Akhil Jindal, Karan Singla, Kinnar Kumar Sen, Kulvinder Bisla
SIEL Labs
International Institute of Information Technology, Hyderabad
An empirical approach to predict Stock Market using financial news articles
2
What ? – Introduction
Why ? – Uses
How ? – Steps
Experiments
Results
Conclusion
References
Agenda
3
Prediction of Stock Market movements and approx. percentage changes without use of traditional stock algorithms.Using the cauldron of emotions, thought processes and sentiments expressed in news articles and blogs related to financial news.
Multiple techniques of IRE used.
Statistical Models derived and experimented with.
Exciting results
What? - Introduction
4
Attempt to find out
• whether sentiments have direct relationship with Stock Market movements
• techniques for disambiguation of entities extracted from free text
Real World Uses
• Guide a stock market trader/investor to gauge the market approximately as expressed by co-traders, journalists , bloggers etc.
Why ? - Uses
5
How ?
Google News
Alerts (For Stock
Market News).
Parse news
articles / blogs etc.
Extract entities using
entity extraction
libraries from GATE.
Calculate multiple
sentiments using entity position tags and various
keywords such as Stock,
English and Contextual
etc.
Mine sentiment of
the news article as a whole using NLP libraries
from Stanford.
Collect Training datasets
Create models using WEKA
( 10 Fold Cross
Validation )
6
How ?
UBUNTU/LINUX
PENTAHO
JAVA
WEKA GATE
ETL MACHINE LEARNING PREDICTIVE ANALYSIS ENTITY EXTRACTION
MySQL
NEWS FEED
ETLD
B
MACHINE LEARNING
PREDICTIVE ANALYSIS
STOCK MARKET PREDICTIONS
Training Date Range
1/1/2013 to 31/12/2013
No. of news articles
~35000
No. of unique company/stock code extracted from news/blog articles
~1400
No. of Stock Keywords
~2156
No. of English Keywords
~2838
No. of records used for building model
~7000
7
Experiments (Classification)
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4535 59.6397 %Incorrectly Classified Instances 3069 40.3603 %Kappa statistic 0.0271Mean absolute error 0.4853Root mean squared error 0.499 Relative absolute error 103.6642 %Root relative squared error 103.1445 %Total Number of Instances 7604
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.846 0.822 0.633 0.846 0.724 0.531 DOWN 0.178 0.154 0.408 0.178 0.248 0.531 UPWeighted Avg. 0.596 0.572 0.549 0.596 0.546 0.531
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4623 60.7969 %Incorrectly Classified Instances 2981 39.2031 %Kappa statistic 0.1212Mean absolute error 0.4342Root mean squared error 0.5242Relative absolute error 92.753 %Root relative squared error 108.3439 %Total Number of Instances 7604
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.757 0.641 0.664 0.757 0.707 0.578 DOWN 0.359 0.243 0.468 0.359 0.406 0.578 UPWeighted Avg. 0.608 0.492 0.591 0.608 0.595 0.578
NATIVE BAYES RANDOM FOREST
8
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.0666
Mean absolute error 2.9873
Root mean squared error 6.4104
Relative absolute error 99.4892 %
Root relative squared error 99.8253 %
Total Number of Instances 7604
Experiments (Regression)
SMOreg
9
Results
NATIVE BAYES RANDOM FOREST
Classification of Stock Movements
10
Results Prediction of Stock Percentages (SMOreg)APPLE STOCK INDEX MICROSOFT STOCK
INDEX
11
The experiment was successful to a large extent for classifying the stock movements over a period of time for both the classification models
The accuracy percentage achieved for classification was about 80% which is significant.
However a lot challenges remain for predicting the percentage changes just by using stock news. This maybe due the lack of following factors• Not using any stochastic parameters in the model• Disambiguation of Stock Code entities • Classification of Stock Companies in groups• The prediction is not Real Time ‘as and when’ news is published hence it may introduce a lot of noise.
• There are a lot of uncertainties involved in predicting stock indices some of them are not addressable by sentiments only
Conclusions
12
https://gate.ac.uk/
http://www-nlp.stanford.edu/
http://weka.wikispaces.com/
http://www.pentaho.com/
References