28
Predicting NBA Player Success or Failure

Predicting NBA Player Success or Failure - Final Paper.pdf

Embed Size (px)

DESCRIPTION

Using regression analysis to predict how good a college player will be in the NBA

Citation preview

Page 1: Predicting NBA Player Success or Failure - Final Paper.pdf

Predicting NBA Player Success or Failure

Page 2: Predicting NBA Player Success or Failure - Final Paper.pdf

An Econometrics 120C Honors Research Project

By Aaron Chou, Daniel Rubin and William Wolfe

Page 3: Predicting NBA Player Success or Failure - Final Paper.pdf

Introduction

In basketball there is no perfect way to determine how well a player is likely to perform

Page 4: Predicting NBA Player Success or Failure - Final Paper.pdf

in the National Basketball Association (NBA) when they are drafted during their college years.

In the NBA draft („the draft‟), players are selected by one of the thirty teams in the NBA based

on how well the teams expect the players to perform in the NBA. Generally, the best players are

drafted first, and successive draft picks are widely accepted to be less talented.1 However, at the

time this paper was written, there were no „good‟ formulas in place to accurately predict how

well a player might do in the NBA, based on their college statistics. This research paper will

attempt to develop such a formula. Our goal is to predict how successful a player will be in the

NBA, given only his college statistics.

Outside of the merit of performing regression analysis for academic purposes, such a

regression might be useful for a wide variety of sports applications. Professional teams might use

a similar formula to help them make their draft selection. Sports gamblers could use this formula

to help decide on whom to place their bets. Finally, college coaches could use this formula to

help improve their player‟s chances of being drafted into the NBA.

The Process

The first step is to find a numerical score that is correlated with how „good‟ or successful

a player is in the NBA. An ESPN sports analyst, John Hollinger, has developed a formula to give

such a numerical score or ranking to current and former NBA players. This score is called the

Player Efficiency Ranking (PER).2 This score is widely accepted to be the best numerical

1 The NBA draft utilizes a lottery system. The worst performing teams are given the highest probability of obtaining a higher draft pick, and therefore, a better player.

Page 5: Predicting NBA Player Success or Failure - Final Paper.pdf

determinant of how „good‟ a player is. We will use PER, adjusted for assisted and unassisted

field goals, and charges. This adjusted PER (APER) was calculated by hoopdata.com.

Page 6: Predicting NBA Player Success or Failure - Final Paper.pdf

This score,

APER, will be our dependent variable in our regression. A player in the NBA is likely to hit his

highest performing peak after his 3rd year in the NBA. Because of this, we will regress

specifically on NBA player‟s 3rd year APER scores. For reference, the highest APER is about 32

(LeBron James), and the NBA league average is 13.79.

The second step is to collect data on the 2004 through 2007 NBA draft classes. We

rd collected data on APER (3

year), Years of College, RPI, Min/Game, FG%, PPG, RPG, SPG,

APG, BPG, PPM, RPM, SPM, APM, BPM, Height, Weight, Wingspan, Body Fat, Max Vert,

Agility and Sprint scores.3 In collecting this data, we had to be careful to avoid including pre

2005 players, who did not attend college, and international players.

The third step is to regress APER on different combinations of variables. After many

different attempts, we decided that the best and most accurate regression did not include many of

the variables which we collected data on. Specifically, the variables we threw out were: Height,

sprint, weight, blocks, wingspan and agility. The determination to remove these variables was

based on their high p-values and corresponding statistical insignificance.

2

PER = (1 / MP) * [ 3P + (2/3) * AST + (2 - factor * (team_AST / team_FG)) * FG + (FT *0.5 * (1 + (1 - (team_AST / team_FG)) + (2/3) * (team_AST / team_FG))) - VOP * TOV - VOP * DRB% * (FGA - FG) - VOP * 0.44 * (0.44 + (0.56 * DRB%)) * (FTA - FT) + VOP * (1 - DRB%) * (TRB - ORB) + VOP * DRB% * ORB + VOP * STL + VOP * DRB% * BLK - PF * ((lg_FT / lg_PF) - 0.44 * (lg_FTA / lg_PF) * VOP) ], Where: factor = (2 / 3) - (0.5 * (lg_AST / lg_FG)) / (2 * (lg_FG / lg_FT)) VOP = lg_PTS / (lg_FGA - lg_ORB + lg_TOV + 0.44 * lg_FTA), and

DRB% = (lg_TRB - lg_ORB) / lg_TRB SOURCE: http://www.basketball-reference.com/about/per.html

3 RPI = Ratings Percentage Index (based on a team‟s wins, losses and strength of schedule). FG% = Field goal percentage. PPG, RPG, SPG, APG, BPG = Points, Rebounds, Steals, Assists, and Blocks, per game, respectively. These were also calculated “Per Minute” (i.e. PPM = Points Per Minute). Wingspan = total length of outstretched arms. Max Vert = players highest vertical jump.

2

Page 7: Predicting NBA Player Success or Failure - Final Paper.pdf
Page 8: Predicting NBA Player Success or Failure - Final Paper.pdf

The Result

APER = - 25.83 ( 8.75 ) + .3max ( .13

Page 9: Predicting NBA Player Success or Failure - Final Paper.pdf

Vert )

+

54.04

spm ( 30.81

) + 32.69 apm + 23.46 rpm + 4.8 ppm +

.16

fg ( 14.91 ) ( 8.96 ) ( 6.23 ) ( .10

) + .4min ( .11

)

- 1.15 years ( .51 ) Analysis

The reasoning behind throwing out these variables is somewhat intuitive. Considering the

fact that the NBA has the highest and most competitive play of basketball, variables that are

only physical attributes such as height, weight, and wingspan should not reasonably be an

indicator of success. While simply being tall or big will certainly give you an advantage at the

high school or even perhaps the college level, the fact of the matter is the NBA represents the

best and most skilled players in the world. A simple height or weight advantage will not be

enough to make you better than everyone else. The other two variables, sprints and agility, are

timed events all draft prospects go through that measure how fast they are. Despite the fact that

speed certainly gives you an advantage, the game of basketball is not a sprint. Thus the ability to

run faster than other players should not determine how good of a player you are. The last statistic

we decided to omit, blocks, was the only on-court statistic we threw out. There was one notable

attribute to the variable blocks: its coefficient was negative. In other words, the more blocks a

Page 10: Predicting NBA Player Success or Failure - Final Paper.pdf

player gets at the collegiate level, the lower his efficiency rating would be. How could this be

possible? One reason is the volatility of drafting college big men. For every great post-player

such as Yao Ming, Shaquille O‟Neal and Tim Duncan, there is a Patrick O‟Bryant, Michael

Olowokandi, and Kwame Brown. That is, drafting centers—players that tend to get the most

blocks—is a huge risk. You could get a great player such as Yao Ming, or you could get a player

that does not even play in the league anymore after three years in Patrick O‟Bryant. Most big

-

.02

rpi ( .01

)

3

Page 11: Predicting NBA Player Success or Failure - Final Paper.pdf

men never develop into players the team that drafted them envisioned they would become. As a

Page 12: Predicting NBA Player Success or Failure - Final Paper.pdf

result, the coefficient on blocks was negative; the variance on the variable blocks is so

unpredictable that this statistic subsequently had a large p-value. That is how we determined why

these variables be removed: based on their high p-values and corresponding statistical

insignificance.

The remaining variables we had left included : maxVert, spm, apm, rpm, ppm, fg, min,

years, and rpi. All of these variables were statistically significant at the 5% level except points

per minute (ppm) and field goals (fg). These two variables were left in the regression because the

number of points scored and the number of field goals made should be an indicator of how good

a player is; games are decided by points, and with the exception of free throws, field goals are

the only way to score points. One notable adjustment we made on these variables is the

adjustment for time and pace. College coaches all have different coaching philosophies. As a

result, the amount of minutes they let their players play and the speed of the game in which they

play vary across all teams. Certain coaches balance the minutes allotted to their players while

other coaches allow their players to play entire games. Some teams emphasize offensive and play

at a very fast pace, thus accumulating more statistics, while other teams play at a sluggish pace

and as a result do not build their own personal statistics. To deal with this inconsistency across

teams, we decided to adjust the statistics we used. For example, instead of seeing how many

points a player averaged a game, we looked at how many points he scored per minute. That way,

a player who played less minutes in a game would not be penalized. We subsequently adjusted

steals, assists, and rebounds to fit into these criteria as well.

Page 13: Predicting NBA Player Success or Failure - Final Paper.pdf

In terms of predicting APER, you can see that all have a positive coefficient; that is the

more points you score, the more assists you get, the more steals you make, etc., the higher your

4

Page 14: Predicting NBA Player Success or Failure - Final Paper.pdf

efficiency rating will be. Looking at the regression, it seems the variable steals has the highest

Page 15: Predicting NBA Player Success or Failure - Final Paper.pdf

coefficient and thus the greatest effect on becoming a better basketball player. For every extra

steal you make in a minute, your efficiency rating should go up by 54.04., whereas every assist

or every rebound made only gives leads to increases of 32.69 and 23.46 respectively. However,

the frequency of steals made in a game is much lower than the number of blocks and assists

made. From our data, the average steals per minute is .04, the average assists per minute is .08,

and the average rebounds per minute is .2. The average player makes twice as many assists per

game than steals, and five times as many rebounds. As a result, for the average player, it is much

easier to become a better player by getting more assists.

The remaining variables, maximum vertical jump (maxVert), years played in

college (years), and team record/strength of schedule (rpi) all can be explained instinctively.

Basketball is a game that depends on athleticism; the most exciting players such as Michael

Jordan, Dominique Wilkins, Lebron James, and Kobe Bryant are all great athletes. To become a

good basketball player, you have to be athletic, and the athletic attribute that translates the best

onto the hardwood floor is a player‟s vertical jump. The higher you jump the more this separates

you from your defender. As a result, every additional inch to a player‟s vertical jump should

translate to an additional .3 in efficiency rating.

The next variable, RPI, had the lowest p-value out of all variables, and with good reason.

RPI is a basically a measurement of how good the team the player was on was. If the player was

on the best team in the nation, his RPI would be 1, and if he was on the worst team, it would be

347. As a result, the coefficient on RPI is negative; the worse your team is, the more it negatively

Page 16: Predicting NBA Player Success or Failure - Final Paper.pdf

affects you at the college level. Intuitively, this makes sense. If you play on a great collegiate

team, the level of competition is higher and thus prepares you for the next level. At the same

5

Page 17: Predicting NBA Player Success or Failure - Final Paper.pdf

time, if you have comparable statistics to a player on a lesser team, chances are you will be more

Page 18: Predicting NBA Player Success or Failure - Final Paper.pdf

prepared to play in the NBA than the other person.

Finally, the last variable used was years of basketball played in college. While this

statistic was statistically significant, it plays no role in forecasting or causal inference. Since the

coefficient is negative, this suggests the longer you stay in college the worse of a basketball

player you will be. Playing less basketball in college does not make you a better basketball

player. Rather, if you are a good basketball player, you cannot afford to waste your time playing

at the collegiate level when you could be making millions playing in the NBA. Thus, the

negative coefficient is a reflection of how good players leave college early to pursue the NBA. In

other words the variable, years, is a self-fulfilling prophecy.

When parsing through our data, there were some players that we did not put into our

regression. Certain players had to be thrown out of the regression despite the fact that they were

drafted. The fact that our regression was heavily based on college statistics limited us to only

college players. Thus, high school players who decided to forgo college and directly enter the

draft could not be put into the regression, and neither could international players since we had no

way of reconciling international basketball statistics with college statistics. Furthermore, the

issue of injuries and trades also forced us to reconsider certain players in our regression. The

problem with injuries and trades are comparable: injured players have less games played and

thus a smaller sample size in terms of examining their efficiency rating. Another impact of

injuries is that the player decides to play injured and subsequently performs at a lower level than

he is capable compared to when he is healthy. The other issue, trades, presents a similar

Page 19: Predicting NBA Player Success or Failure - Final Paper.pdf

problem. Players that are traded to another team during the season face several hurdles: they

must learn a new playbook and incorporate themselves to completely new team, coach, and

6

Page 20: Predicting NBA Player Success or Failure - Final Paper.pdf

environment. As a result, players that have been traded usually do not see as much playing time

Page 21: Predicting NBA Player Success or Failure - Final Paper.pdf

as they must deal with the adjustment of playing on a different team. Even when they do get time

on the floor, their quality of play is again likely to be lower than their capabilities. Indeed, the

lack of familiarity that results from being traded devalues a player‟s efficiency rating. Thus, we

found it best not to include players that fell in the following categories: players that did not go to

college, players that came from international leagues, players that were traded in the middle of a

season, and players that were injured for prolonged periods of time.

Next Steps

After completing our regression, our next step was application. We wanted to use our

regression to predict future NBA success for current college basketball players. Seeing as the

2009-2010 college basketball season had just came to an end, the nation‟s top college players

had recently entered their names into the 2010 NBA Draft, which takes place in June. Many

basketball analysts have made their prediction on who the top picks in the draft will be. We

must make the assumption that these predictions correlate with who they think will be the best

NBA players. NBA.com created a consensus mock draft which they state, “The Consensus

Mock Draft is a compilation of the best mock drafts around the web. We bring them together to

come up with a good estimate of how the draft could play out.” They predict the top 10 in the

NBA draft will play out in Table 1. In Table 2, we entered 36 of the best players who declared

for the NBA draft into our regression.

In comparison to the Consensus Mock Draft, we both identified John Wall as the number

one pick, and Evan Turner as the second overall pick. Overall, we had 8 of the same 10 picks.

7

Page 22: Predicting NBA Player Success or Failure - Final Paper.pdf
Page 23: Predicting NBA Player Success or Failure - Final Paper.pdf

After we completed this project, we found out that John Hollinger did a similar regression

Page 24: Predicting NBA Player Success or Failure - Final Paper.pdf

analysis to create what he called his “Draft Rater”. Table 3 shows Hollinger‟s top nine players

according to his Draft Rater. We can see that Hollinger‟s and our regressions predicted 8 of the

same top 9. With the similarities between ours and the other two, it is safe to say that we have a

valid regression.

Table 1 Table 2 Table 3

NBA.com Our Regression John Hollinger

1 John Wall John Wall DeMarcus Cousins

2 Evan Turner Evan Turner Evan Turner

3 Derrick Favors Wesley Johnson John Wall

4 DeMarcus Cousins Greg Monroe Greg Monroe

5 Wesley Johnson Darington Hobson Derrick Favors

6 Al-Farouq Aminu Luke Babbitt Xavier Henry

7 Greg Monroe DeMarcus Cousins Luke Babbitt

8 Cole Aldrich Derrick Favors Al-Farouq Aminu

9 Ed Davis Al-Farouq Aminu Wesley Johnson

10 Ekpe Udoh Ekpe Udoh

8

Page 25: Predicting NBA Player Success or Failure - Final Paper.pdf

Limitations

While the comparisons of our regression results with other prominent draft forecasts

Page 26: Predicting NBA Player Success or Failure - Final Paper.pdf

compare favorably, the limitations of our project are the same as any other draft predictions; we

don‟t know how successful these players will be in the NBA until they actually play. Also, as

we noted earlier, APER isn‟t a perfect indicator of how good an NBA player is. It is the best

quantitative estimate that we know of.

In addition there are many unobservable variables that could be correlated with how good

a player is. For example, the mental toughness of a player as well as work ethic are probably

strongly correlated with how good a player turns out. Yet each of these variables are hard to

measure. Another factor of how good a player turns out is the environment they play in. For

example, if a player gets drafted by a team that already has a superstar playing that position, odds

are that the younger player will have limited playing time which could limit his growth as a

player.

Lastly, other decisions besides measurable statistics factor into a team‟s decision when

they are drafting a player. Players are also evaluated by their character. If a team believes that a

player may be as dedicated as he should be, that could factor into their drafting decision. Also,

some teams choose not to draft the player who they consider the best available. Rather they draft

a player based on what position the team has a need at. Other factors that might influence a

team‟s drafting decision include possible health or injury risks for certain players, and how a

player‟s skill set translates from the collegiate to the NBA style of basketball.

9