Project Outline
Project Research
Data Sources and Tools
Data Transformation
Decision Tree
Decision Tree Results
Classification Predictive Modeling
Prediction Results
Future Efforts
Project Research
Projecting Point Spreads◦ http://cs229.stanford.edu/proj2010/LiuLai-BeatingTheNCAAFootballPointSpread.pdf
Predicting individual games◦ http://cfbpredictions.com/
Predicting BCS Rankings◦ http://harvardsportsanalysis.wordpress.com/2011/
11/24/making-sense-of-the-chaos-a-bcs-prediction-model/
Data Sources and Tools
Data Sources◦ http://www.cfbstats.com
Tools◦ Microsoft Excel
◦ Rapid Miner
Data Transformation
Data Challenges◦ Data home/away teams listed in separate spreadsheet
◦ Data rows lacked visiting team statistics
◦ Data rows lacked who won the game
◦ Many numerical fields difficult for classification
Data Solutions◦ Used Excel vlookups to pull in visiting team statistics into
the tuples
◦ Used the “Points” field to compare home vs away to create a “Winner” classification field.
◦ Built Formula fields to transform numerical values into a text classification (TOP, Special Teams, Penalty Yards)
Decision Tree
Using Rapid Miner and a Decision Tree (Gini Index) built a model to determine key factors for winning
Decision Tree Results
Visit Rush Yards Most important Factor in the model >= 165 Yards Visit Winner 66% of the time
<= 165 Yards Home Winner 79% of the time
Classification Predictive Modeling
Predictive Model Plan◦ Predict winner of an individual game based upon key statistics
◦ Use same data set as the decision tree
◦ Split the data into Testing and Training
◦ Evaluate different data elements and there affects on prediction
Classification Predictive Modeling
Using Rapid Miner◦ Naïve-Bayes Classification Model
◦ Split Example set into 2 sets using X-Validation
◦ Analyzed using Performance Tool
Future Efforts
Model currently relies on a user manually entering key statistical information for a game
Model could be enhanced for a full predictive solution ◦ Using a neural network statistical data could be predicted for an individual team for a particular game
◦ Data from the neural network could feed the classification model to predict a game or series of games
References
http://www.cfbstats.com
http://espn.go.com/college-football/
http://cs229.stanford.edu/proj2010/LiuLai-BeatingTheNCAAFootballPointSpread.pdf