1
Textual Analysis Learning Algorithm for Stock Market Prediction Harmeet Cheema, Roger He, Liyuan Li, Nathan Mak Stock Predictions Today Financial analysts seek to create models that can be used to predict movements in the financial sector. Stock trading in particular, makes up a significant portion of the finan- cial industry. Developments in modeling have led to vari- ous theories based on technical analysis. One method of removing the randomness of stock move- ments is to observe its reaction to news. The public reads the New York Times, watches CNN and attends company announcements in hopes of gaining insight into the day’s trading. Natural disasters, elections, quarterly reports and other breaking news have the potential to create large market movements. The cause of changes is not fully un- derstood and is likely due to a combination of group psy- chology and fundamental valuations. Regardless of cause, news presents a reliable instigator for change. Research Goals 1. Develop a system to process vast amounts of news and ticker data 2. Find a correlation between published news and mar- ket changes 3. Exploit this correlation to make predictions in stock movements A Textual Approach The project consists of four highly modular components: • A News and financial data collection python script includes basic text processing and metric calculations. Thousands of articles are scraped from Google.com news and filtered with a blacklist. Day to day stock prices are collected from Marketwatch.com. The results, regardless of source, are temporarily stored in an XML formatted file. • Learning algorithm which includes additional text pro- cessing for calculating word count correlation and setting article weightings. Regression analysis is conducted over a training set using a least square error minimization. • MySQL database hosted on an Amazon’s EC2 serv- er provides a reliable base for running scraping and pro- cessing operations. The structure of the database contains both learning algorithm dependent and independent sec- tions for future algorithm flexibility. • A webpage heavily structured around dynamic java widgets used to view results pulled from the database. Performance • Nearly 100 000 articles processed for only 10 stocks • Clear correlation between news • 71% predictive accuracy Challeneges for the Future • More efficient algorithms for handling large dynamic training sets • Filtering news based on industry, time, location, quan- tity, writing style and content type • Incorporation of natural language processing References Jang, J.-S. (1993). ANFIS: adaptive-network-based fuzzy inference system . Systems, Man and Cybernetics, IEEE Transactions on , 665-685. Schumaker, R., & Chen, H., (2009) Textual Analysis of Stock Market Prediction Using Breaking Financial News. Association for Computing Machinery Transactions on In- formation Systems, 27(2)

Fydp poster type a rev 1.0

Embed Size (px)

Citation preview

Page 1: Fydp poster type a rev 1.0

Textual Analysis Learning Algorithm for Stock Market PredictionHarmeet Cheema, Roger He, Liyuan Li, Nathan Mak

Stock Predictions TodayFinancial analysts seek to create models that can be usedto predict movements in the financial sector. Stock tradingin particular, makes up a significant portion of the finan-cial industry. Developments in modeling have led to vari-ous theories based on technical analysis.

One method of removing the randomness of stock move-ments is to observe its reaction to news. The public readsthe New York Times, watches CNN and attends companyannouncements in hopes of gaining insight into the day’strading. Natural disasters, elections, quarterly reports andother breaking news have the potential to create largemarket movements. The cause of changes is not fully un-derstood and is likely due to a combination of group psy-chology and fundamental valuations. Regardless of cause,news presents a reliable instigator for change.

Research Goals1. Develop a system to process vast amounts of newsand ticker data

2. Find a correlation between published news and mar-ket changes

3. Exploit this correlation to make predictions in stockmovements

A Textual Approach

The project consists of four highly modular components:

• A News and financial data collection python scriptincludes basic text processing and metric calculations.Thousands of articles are scraped from Google.com newsand filtered with a blacklist. Day to day stock prices arecollected from Marketwatch.com. The results, regardlessof source, are temporarily stored in an XML formatted file.

• Learning algorithm which includes additional text pro-cessing for calculating word count correlation and setting

article weightings. Regression analysis is conducted overa training set using a least square error minimization.

• MySQL database hosted on an Amazon’s EC2 serv-er provides a reliable base for running scraping and pro-cessing operations. The structure of the database containsboth learning algorithm dependent and independent sec-tions for future algorithm flexibility.

• A webpage heavily structured around dynamic javawidgets used to view results pulled from the database.

Performance• Nearly 100 000 articles processed for only 10 stocks

• Clear correlation between news

• 71% predictive accuracy

Challeneges for the Future• More efficient algorithms for handling large dynamictraining sets

• Filtering news based on industry, time, location, quan-tity, writing style and content type

• Incorporation of natural language processing

ReferencesJang, J.-S. (1993). ANFIS: adaptive-network-based fuzzyinference system . Systems, Man and Cybernetics, IEEETransactions on , 665-685.

Schumaker, R., & Chen, H., (2009) Textual Analysis ofStock Market Prediction Using Breaking Financial News.Association for Computing Machinery Transactions on In-formation Systems, 27(2)