1
Predicting the Influence of Textual Financial News on Intraday Stock Trading Matthew Bravo Advisor: Charles Elkan Background We attempt to create a model based on textual financial news releases for use in exploiting price movements following the news release. Our model is interested only in exploiting movements that occur during trading hours on the day the news is released. We analyze the thirty “blue chip” stocks in the Dow Jones industrial average and their daily movements over several months. Experimental Setup Training data: ~2030 articles posted on finance.yahoo.com DOW30 RSS feed from 5/26/2005-6/25/2005 Test data: ~1800 articles posted on finance.yahoo.com DOW30 RSS feed from 6/25/2005-7/11/2005 Data pruned to 1000 trained articles and 810 test articles by using only articles released during 13:30GMT - 20:00GMT (NYSE/NASDAQ trading hours) and using only articles that directly mentioned companies tracked. All data is labeled based on a linear approximation of the stock price movements. The data is label UP or DOWN depending on the trend of prices surrounding the announcement of the news. A naïve Bayes classifier is trained on the test set. The model uses stemming on the articles, removal of stop-words, and then classifies the articles using the top 100 Labeling News Events Text Categorizat ion Hold and Sell Random Guessing Cumulative Gain 31.59% 15.9% .17% Ave.Gain per Trade .067% .062% .0034% Std. Dev per Trade .44% .91% .38% Max Gain 3.01% 2.85% 3.02% Min Gain -1.2% -2.49% -1.87% Hit Rate 56.54% N/A N/A Performance Evaluation We find that on average our model only slightly outperforms the market wide hold and sell approach. However, since we can exploit market moves on a much smaller time scale and exploit both positive and negative trends, the cumulative reward is greater than a traditional hold and sell approach and much better than simply randomly guessing. Use the average optimal trading time of the training data to decide how long to hold onto a stock after a prediction is made. For UP events – Optimal trading estimate = 80 mins For Down events = Optimal trading estimate = 95 Error based Linear Approximation Algorithm split(T[t a , t b ]) for i = a to b E i = ( i - i ) 2 if E max < E i E max = E i t k = t i end if E total = E total + E i end for E = E total / (b-a) if error-test(E) T points = t k T points = T points U split(T[t a ,t k ]) T points = T points U split(T[t k ,t b ]) end if

Predicting the Influence of Textual Financial News on Intraday Stock Trading

  • Upload
    jagger

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Predicting the Influence of Textual Financial News on Intraday Stock Trading. Matthew Bravo Advisor: Charles Elkan. Performance Evaluation. Labeling News Events. Background. - PowerPoint PPT Presentation

Citation preview

Page 1: Predicting the Influence of Textual Financial News on   Intraday Stock Trading

Predicting the Influence of Textual Financial News on Intraday Stock Trading

Matthew Bravo

Advisor: Charles ElkanBackgroundWe attempt to create a model based on textual financial news releases for use in exploiting price movements following the news release. Our model is interested only in exploiting movements that occur during trading hours on the day the news is released. We analyze the thirty “blue chip” stocks in the Dow Jones industrial average and their daily movements over several months.

Experimental SetupTraining data: ~2030 articles posted on finance.yahoo.com DOW30 RSS feed from 5/26/2005-6/25/2005

Test data: ~1800 articles posted on finance.yahoo.com DOW30 RSS feed from 6/25/2005-7/11/2005

Data pruned to 1000 trained articles and 810 test articles by using only articles released during 13:30GMT - 20:00GMT (NYSE/NASDAQ trading hours) and using only articles that directly mentioned companies tracked.

All data is labeled based on a linear approximation of the stock price movements. The data is label UP or DOWN depending on the trend of prices surrounding the announcement of the news.

A naïve Bayes classifier is trained on the test set. The model uses stemming on the articles, removal of stop-words, and then classifies the articles using the top 100 features selected based on the features information gain.

Labeling News Events Text

Categorization

Hold and Sell Random Guessing

Cumulative Gain

31.59% 15.9% .17%

Ave.Gain per Trade

.067% .062% .0034%

Std. Dev per Trade

.44% .91% .38%

Max Gain 3.01% 2.85% 3.02%Min Gain -1.2% -2.49% -1.87%Hit Rate 56.54% N/A N/A

Performance Evaluation

We find that on average our model only slightly outperforms the market wide hold and sell approach. However, since we can exploit market moves on a much smaller time scale and exploit both positive and negative trends, the cumulative reward is greater than a traditional hold and sell approach and much better than simply randomly guessing.

Use the average optimal trading time of the training data to decide how long to hold onto a stock after a prediction is made.

For UP events – Optimal trading estimate = 80 mins

For Down events = Optimal trading estimate = 95 mins

Error based Linear Approximation Algorithm

split(T[ta, tb])for i = a to b Ei = (i - i)

2

if Emax< Ei

Emax = Ei

tk = ti

end if Etotal = Etotal + Ei

end forE = Etotal / (b-a)if error-test(E) Tpoints = tk

Tpoints = Tpoints U split(T[ta,tk]) Tpoints = Tpoints U split(T[tk,tb])end ifreturn tk