Upload
partha-sen
View
197
Download
0
Embed Size (px)
Citation preview
PyCon 2016Beat Stock Index Return: use of Technical
Analysis, Machine Learning, Sentiment and Back-Testing
Talks @parthasen
Referenceshttp://pandas.pydata.org/pandas-docs/stable/io.html
http://gbeced.github.io/pyalgotrade/docs/v0.17/html/tutorial.html#trading
Part A: 8 min
1. Introduction: EMH and frequency of data.
2. Downloading open source data from yahoo.
3. Use of pandas,numpy to read data, analysis and input-output in csv format from hard disk.
4. Technical analysis and plotting data using Matplotlib
Data
Open source data available from Yahoo but daily data
Minute, Second or Tick data available from paid databases.
My Talks will be using daily data loaded from yahoo and minute data from paid databases.
Downloading Yahoo! data
In future pandas releases (0.17+) pandas-datareader will become a dependancy and using pandas.io.data will be equivalent to using pandas_datareader.data.For now, you must replace your imports from pandas.io with pandas_datareader:
$ pip install pandas-datareader
import pandas.io.data as web
from pandas_datareader import data as web
import pandas_datareader as pdr
pdr.get_data_yahoo('AAPL')
https://pandas-datareader.readthedocs.io/en/latest/
Dataset from 2008-01-01 to 2016-07-15
Open High Low Close Volume Adj CloseDate 2016-07-11 855.0 862.00 853.00 860.19 84100 860.192016-07-12 865.0 866.00 860.19 865.32 30100 865.322016-07-13 864.9 867.00 860.15 865.48 112700 865.482016-07-14 870.0 870.51 864.00 869.44 63400 869.442016-07-15 875.0 875.00 867.00 868.04 33000 868.04
Open High Low Close Volume Adj CloseDate 2008-01-01 619.5 622.00 618.0 619.99 2200 608.422008-01-02 622.0 625.00 609.5 622.94 16100 611.312008-01-03 625.0 627.00 616.0 621.31 9700 609.712008-01-04 627.0 637.45 620.1 633.45 7100 621.632008-01-07 625.1 633.50 625.0 633.00 10400 621.18
Daily ploting of NIFTYBEES from 2008
Daily ploting of NIFTYBEES from 2015
Log Return
Cumulative return from 2008
Volatility
Part B: 8 min
1. Downloading tweets and application of tweets for sentiment analysis
2. Application of scikit-learn for machine learning
3. Selection of best technique.
4. Regression analysis for prediction of price.
Data Set for Machine Learning
Open High Low Close Volume Adj Close Open Change \Date 2008-03-19 471 481.0 462.10 463.22 0.541796 454.57 1 2008-03-24 461 472.0 454.95 463.28 2.290142 454.63 -1 2008-03-25 473 493.0 473.00 490.17 1.084630 481.02 1 2008-03-26 492 494.0 486.05 489.15 0.588848 480.02 1 2008-03-27 487 491.7 483.00 488.27 0.641767 479.16 -1 US_Mkt Volatility 3UD momentum RSI 14d \Date 2008-03-19 1 0.236933 0 -31.03 24.415525 489.195714 2008-03-24 1 0.235345 0 -7.67 24.452459 484.429286 2008-03-25 1 0.244263 0 11.59 38.869415 482.062857 2008-03-26 1 0.251964 0 30.53 38.568750 480.787857 2008-03-27 -1 0.259302 0 26.76 38.293538 480.195714 42d Cross BS2 momentum_ RSI_ Change Date 2008-03-19 513.050476 -1 1 -1 -1 NaN 2008-03-24 511.333571 -1 1 -1 -1 -0.004804 2008-03-25 511.204286 -1 0 0 -1 0.020764 2008-03-26 510.429048 -1 0 0 -1 0.003726 2008-03-27 509.925000 -1 0 0 -1 -0.004405
Histogram of Features
Histogram of target
SVM regression
Plotting of features