10
March Data Crunch Madness Team Coach K. Yi Chun Chien, Xiayu Zeng, Feifei Chen, Xiaoshan Jin March 2015

March madness sports analysis

Embed Size (px)

Citation preview

Page 1: March madness sports analysis

March Data Crunch Madness

Team Coach K.Yi Chun Chien, Xiayu Zeng, Feifei Chen,

Xiaoshan Jin

March 2015

Page 2: March madness sports analysis

2Introduction❖ Background: NCAA Men’s Basketball Tournament is a single-elimination tournament,

currently featuring 68 college teams.

❖ Objective: Develop a predictive model that examines factors contributing to a team’s performance, based on data from 2001-2014.

❖ Result: As can be analyzed from the model, box score has a large effect on a team’s result in 2015, which is helpful to predict:➢ Win/Lose➢ Winning Probability➢ Sweet Sixteen

Page 3: March madness sports analysis

3Independent & Dependent Variables

IndependentVariables

SeedLocation

Box Score

Assist, Steal, Block Shot,% 2/3 Point Field Goals,% Free Throws, Tempo

Seed#,If this team is Top 5,If this team is 15/16

Latitude, Longitude, Distance Difference

Dependent Variable:Win/Lose

Page 4: March madness sports analysis

4

Performance Validation

Accuracy

ROC Curve

AUC

RMSE

Define Data Group

Decision Tree

Build 5 Models

Bootstrap Forest

Boosted Tree

Neural Network

Nominal Logistic

Training (80%)

Validation(20%)

Testing (2015)

Probability

Win/Lose

Top 16

Prediction

Analysis Process

Page 5: March madness sports analysis

5

● Distribution Review: Most variables are normal distributed

Distribution and Correlation● Scatter Matrix: Few variables has linear correlation

Page 6: March madness sports analysis

6

5 Models Performance

Validation

Nominal Logistic Regression Accuracy: 72%

ROC Curve for Validation

Nominal Logistic Regression has the best performance

Performance Validation

Training

Page 7: March madness sports analysis

7

Result Lose WinLose 6 6Win 5 24Total 11 30

● 2015 Forecast Top 16 team ● 2015 Forecast Result: 73% accuracy

Prediction

27%

73%

Forecast Result IncorrectForecast Result Correct

Page 8: March madness sports analysis

8Model ExplanationDefensive efficiency, offensive efficiency, opponent’s blocked shots and assists are most important attributes based on individual p-value

According to our analysis results, good offensive efficiency contributes more than defensive efficiency in leading a team’s success

The closer the distance to stadium, the better result a team performs

Page 9: March madness sports analysis

9Interesting Analysis● Average score difference is narrowing down● The score pattern for Top 5 Seeds is less volatile

than the one for bottom 2 seeds

● 9 out of 16 is predicted correctly● Only Georgetown shows a declining pattern

of winning probability

Page 10: March madness sports analysis

10

Result and Conclusion❖ Whether a team wins or loses is positively related to four

primary factors:➢ offensive efficiency➢ defensive efficiency➢ block shots➢ assists

❖ Accuracy: Our model is 72.19% accurate in predicting a team’s result for 2015.