Transcript

This article was downloaded by: [Northeastern University]On: 04 November 2014, At: 15:18Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41Mortimer Street, London W1T 3JH, UK

CHANCEPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/ucha20

Competing Risks in Basketball … Competing Risks inBasketball … Competing Risks in Basketball …Laura Taylor aa department of mathematics and statistics , Elon University in North CarolinaPublished online: 27 Apr 2012.

To cite this article: Laura Taylor (2012) Competing Risks in Basketball … Competing Risks in Basketball … Competing Risks in Basketball…, CHANCE, 25:2, 31-36

To link to this article: http://dx.doi.org/10.1080/09332480.2012.685367

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations orwarranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsedby Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectlyin connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction,redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expresslyforbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

CHANCE

31

competing risks in basketball …competing risks in basketball …competing risks in basketballLaura Taylor

My husband has always encouraged me to use my “super powers of statis-

tics” for the greater good—analyz-ing sports data, that is. Therefore, it was not surprising that I found myself watching the 2011 NCAA Men’s Basketball Tournament championship game that pitted But-ler University against the University of Connecticut on April 4, 2011. While watching this game, I began to think about how scoring in bas-ketball is a competing risk of points scored and points allowed where each is garnered from either a free throw, two-point shot, or three-point shot. Not only are these outcomes competing against each other, they are recurrent throughout the entire game. Thus, using my super power to model the scoring of Butler and UCONN as recurrent competing risks became more interesting to me.

To this end, I observed that the most recent common opponent was the University of Pittsburgh. UCONN met Pittsburgh during the Big East Tournament; Butler took on Pittsburgh during the third round of the NCAA tournament. Nineteenth-ranked UCONN beat third-ranked Pittsburgh 76–74

on March 10, 2011, and eighth-seed Butler conquered first-seed Pittsburgh on March 19, 2011, 71–70. This article seeks to model the points scored and the points allowed by both championship con-tender teams based on their perfor-mance against Pittsburgh using a combination of competing risks and recurrent events.

Competing risks and recurrent events are both fields of survival analysis. Competing risks garners its name from the way the data are observed. A unit is subjected to risks that are competing to be the first and only cause of failure (or success). The time and cause of failure are both recorded. For example, a pace-maker can be subjected to either mechanical or electrical failures, both of which are competing to be the cause of failure for the pacemaker.

It is common to observe competing risks in biomedical studies and engineering. For a basketball team, each play of the game can result in one of the two teams scoring so there are six out-comes of interest—successful points made from free throws, two-point baskets, and three-point baskets and

points allowed by the opponent from free throws, two-point baskets, and three-point baskets.

In traditional competing risks analysis, an observation ends when the first success occurs and the time to event for the remaining risks are not observed. However, after the first score in a basketball game, the game goes on! After a point is scored or allowed, the stage is set for the team to score or allow points again.

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

15:

18 0

4 N

ovem

ber

2014

VOL. 25.2, 2012

32

Recurrent events analysis mod-els data for a single type of failure (or success) repeated over time. After each failure (or success), the unit is repaired using either a com-pletely perfect, minimal, or imper-fect repair strategy and observation of the unit continues for further events. If we consider monitoring a string of Christmas lights for a failed light bulb, a completely per-fect repair would replace all the light bulbs when one bulb fails. A mini-mal repair strategy would replace only the blown light bulb with a light bulb with the same age as the previously blown light bulb while all remaining light bulbs would con-tinue to age. An imperfect repair strategy would also only replace the blown light bulb. However, it could be replaced with either a new or aged light bulb. Recurrent events are common in biomedical stud-ies, engineering, and economics. As another example, a patient can be

monitored over time for the recur-rence of nonmalignant tumors.

In a basketball game, we can record the times of each free throw, two-point score, and three-point score for each team. From the perspective of one of the teams, there are six types of recurrent competing risks corre-sponding to offensive and defen-sive success or failure—free throw scored, two-pointer scored, three-pointer scored, free throw allowed, two-pointer allowed, or three-pointer allowed. We will model the offense and defense of Butler and UCONN based on their per-formance against Pittsburgh, their most recent common opponent.

competing risks and recurrent Event dataPlay-by-play data were collected from ESPN.com for each of the games of interest. Both of the

Figure 1. Time between points scored and points allowed for Butler during their game against Pittsburgh for 20 periods of game time. Butler is represented by the solid green shapes, and Pittsburgh is represented by the red outlined shapes.

games occurred during tourna-ment play at the end of the season, so we would expect that Butler and UCONN were playing with a high level of commitment. Also, neither game went into overtime, so there is a total of 40 minutes of game time for each team ver-sus Pittsburgh. The time of occur-rence for free throws, two-pointers, three pointers, and time-outs was observed for both competitors in each game. Time-outs were con-sidered censoring since the game clock was stopped and the ball had to be thrown back in-bounds by the team in possession. Therefore, the data were considered to be compet-ing risks with six successful events: points scored and allowed from free throws, two-pointers, and three-pointers. After a successful two-pointer or three-pointer, the ball is returned to the opposing team and the scoring cycle repeats. Typically speaking, after a free throw, the opposing team generally reclaims possession of the ball. In this scenario, the basketball scoring data will be considered to operate under a perfect repair strategy.

A point that should be discussed is how free throws are considered. First, scoring one-for-one, one-for-two, two-for-two, or any other vari-ation of “at least one successful free throw” were all recorded as one type of event—a successful free throw. For free throws that are recorded concurrently with a time-out, their occurrences were recorded as one-tenth of a second before the time-out, since those points are resulting from some action in the game prior to the time-out. Any other points recorded simultaneously in the official play-by-play were adjusted using the same technique.

The data are presented in Figures 1 and 2 for Butler and UCONN, respectively. Each unit in this sce-nario as depicted in the graph consists of continuous game time between time-outs or the end of a half. For example, there were 20 units of game time for the Butler versus Pittsburgh game due to time-outs, TV time-outs, and the end of both halves. On the horizontal axis,

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

15:

18 0

4 N

ovem

ber

2014

CHANCE

33

the calendar time of each successful score is recorded. After each time-out or the end of a half, calendar time is reset to zero and monitoring of continuous game time restarts for events of interest.

The first striking characteris-tic in the data is that Butler does not make its first free throw until less than 10.2 minutes left in the second half of the game, as rep-resented during the 12th unit of game time. In total, Butler scored seven successful free throws. Comparatively, UCONN scored 12 free throws spread throughout the game. Typically, this would indicate that UCONN’s players are more aggressive in seeking out or being able to obtain shots in the paint.

Butler and UCONN also dif-fer in their scoring performance based on the type of successful shots made. UCONN scored a total of 23 two-pointers and 3 three-pointers compared to But-ler’s 12 two-pointers and 12 three-pointers. Butler and UCONN performed similarly in terms of points allowed by Pittsburgh, with each team allowing 7 free throws and 19 two-point baskets. Both teams allowed similar three-point baskets, as well, with Butler allowing 6 and UCONN allowing 8. Butler and UCONN both fairly consistently allowed free throws by Pittsburgh throughout the course of the games.

modeling inter-Event times for points scored and points AllowedCompeting risks can focus on mod-eling the time to an event, which progresses into modeling the time between events or inter-event times for recurrent event analy-sis. The inter-event time distribu-tion can be modeled by obtaining maximum likelihood estimates for the parameters of the hazard func-tion, the instantaneous rate of fail-ure for a unit that has survived up to a specified point in time. That is, the hazard function gives the

probability of a failure occurring at time t, given that it has not yet occurred, and mathematically, the hazard function is the ratio of the probability density function over the survival function. The probabil-ity density function and the sur-vival function can both be defined in terms of the cumulative distribu-tion function, F(t) P(T t). The probability density function, f(t), is the derivative with respect to t of F(t), whereas the survival func-tion is given by S(t) 1 F(t) P(T t). Assume that the hazard function associated with the inter-event time distribution for the qth type of scoring is given by the function q (t;q), where q is a vector of parameters associated with the qth type of scoring. For the basketball data, q is the type of basket made with a 1, 2, or 3 denoting a team scoring a free throw, two-pointer, or three-pointer, respectively, and 4, 5,

Figure 2. Time between points scored and points allowed for UCONN dur-ing their game against Pittsburgh for 19 periods of game time. UCONN is represented by the solid green shapes, and Pittsburgh is represented by the red outlined shapes.

and 6 denoting a team allowing each of these, respectively. We choose to model the data using a Weibull distribution with the following parameterization of the hazard and probability density functions:

The Weibull distribution is a popular choice when modeling inter-event times (or time to event data), since the probability density function can flexibly take on a wide variety of shapes. When 1, the probability density function associ-ated with the Weibull distribution is a convex curve. For 1, the curve follows a more traditional right-skewed shape, shifting away

t t; ,1

λ α βαβ β

( )=α−

f t t e; , .t1

α βαβ β

( )=α

β−

−α

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

15:

18 0

4 N

ovem

ber

2014

VOL. 25.2, 2012

34

from 0 until it quickly begins to resemble a left-truncated unimodal curve. Each of these shapes is found in Figure 3. Interestingly, the expo-nential distribution, which is also commonly used to model waiting times, is a special case of the Weibull distribution when 1.

To estimate the parameters asso-ciated with the hazard functions for each event type (and consequently the parameters associated with the distribution) for a basketball team, first consider the individual data for each team in the context of calen-dar times as opposed to inter-event times. Denote the calendar time of the jth recurrence or score for the ith unit of game time by Sij. For each

calendar time, we also record the type of event, ij, where ij is a 1, 2, 3, 4, 5, or 6. Table 1 displays the data col-lected for the first two units of con-secutive game time (i 1 and 2) for Butler versus Pittsburgh. These are illustrated in the first two horizontal lines from the bottom of the data in Figure 1. Additionally, the recur-rent inter-event times are censored when a time out is called or the end of a half is reached. We indicate the time of censoring for the ith time period as i. Since the first time out occurred 285 seconds into the game for Butler versus Pittsburgh, 1 = 285 seconds as seen in Table 1 and the first horizontal row of data in Figure 1, which ends at 285 seconds.

resultsFrom this data, we fit a Weibull hazard function by using the maxi-mum likelihood approach to model the inter-event time distributions associated with each of the six com-peting risks. The estimates for the Weibull shape and scale parameters, and , for the time between events for each type of scoring and team versus Pittsburgh, as well as the mean and mode of the estimated inter-event time distributions, are given in the Table 2.

The estimated hazard function and inter-event time distribution function for free throws scored by Butler versus Pittsburgh are given by

The estimated inter-event time distributions for all events are dis-played pictorially in Figure 3, which compares the inter-event time dis-tributions for Butler and UCONN (with Pittsburgh as their opponent) for each of the six events.

From Table 2 and Figure 3, sev-eral differences between the scoring abilities for Butler and UCONN are visible. In particular, UCONN is able to generate free throws in closer succession than Butler. Of interest, it should be noted that the probabil-ity distribution function associated with UCONN scoring free throws is the only distribution that does not go to 0 as time goes to 0. The two teams are fairly comparable in the time between two-pointers. This is a disadvantage for UCONN, since most of UCONN’s scoring comes from two-pointers.

Table 2 and Figure 3 also show that the time between three-point-ers for Butler versus Pittsburgh tends to be less than the time between three-pointers for UCONN versus Pittsburgh. This feature appears to favor Butler’s scoring abilities. Both teams exhibit similar distributions for the points allowed by Pittsburgh in their respective games. However,

Game time (seconds)

calendar time (seconds)

Event indicator

21 S11 21 11 5 (Two-pointer allowed)

109 S12 109 12 2 (Two-pointer scored)

126 S13 126 13 5

156 S14 156 14 2

216 S15 216 15 3 (Three-pointer scored)

232 S16 232 16 5

264 S17 264 17 3

285 1 285 1 7 (Clock stopped)

300 S21 15 21 5

316 S22 31 22 2

330 S23 45 23 5

347 S24 62 24 2

384 S25 99 25 5

406 S26 121 26 3

463 S27 178 27 5

482 S28 197 28 3

516 2 231 2 7

table 1. data Associated with first two units of Game time for butler versus pittsburgh

t t1.0479.04 79.04

1.04 1

λ( )=−

f t t e1.0479.04 79.04

.t1.04 1

79.04

1.04

( )=−

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

15:

18 0

4 N

ovem

ber

2014

CHANCE

35

UCONN had a greater tendency for allowing Pittsburgh to score in smaller time intervals, most evi-dent with free throws. From the data gathered on the teams’ per-formances against Pittsburgh, the championship game between Butler and UCONN could have been an exciting game with two very evenly performing teams, but the game ended with a low final score of 53 to 41, with UCONN as the winner.

discussionAdditional considerations should be taken into account when model-ing these basketball data as com-peting risks. In particular, the issue of whether the risks or events are operating independently should be considered. Inter-event times were assumed to be generated from distri-butions that were independent and identically distributed. However, the

type of scoring attempted by a team is very much tied to the player in pos-session of the ball and the amount of time left on the shot clock. Some players are more likely to generate three-pointers, while others are more comfortable making shots within the two-point range.

Other concerns are that the strategies of a basketball team evolve based on opponent, time during season, and time during the game. These changes in strat-egy affect the distribution of time between events. A notable change in strategy occurs during the course of a game when a team is losing in the last few minutes, as this often results in an increase in the number of intentional fouls used to try to regain possession of the ball to score. Players also can miss games for injuries or suspension, and different collections of players have different patterns, skills, and camaraderie that affect the overall

scoring and defensive abilities of the team.

Other concerns are that the cen-soring distribution can be depen-dent upon the observed data. For example, a team that is behind can be more likely to call a time out to regroup.

While analyzing basketball data in a recurrent competing risks framework is entertaining, this example illustrates the rich data that can be collected under the frame-work of recurrent competing risks. An often-used data set in recurrent events was first introduced by Frank Proschan in 1963. The data present the failure time of air conditioners on Boeing 720 airplanes. Had the cause of failure—mechanical, elec-trical, etc.—been recorded, the data would have been much more informative on building better maintenance and repair plans for maintaining air conditioners on air-planes. Little data have been

butler vs. pittsburgh uconn vs. pittsburgh

Free Throws Scored

1.04 79.04 0.90 40.16Mode 3.4 seconds Mode 0 secondsMean 77.8 seconds Mean 42.3 seconds

Two-Pointers Scored

1.50 57.96 1.53 43.84Mode 27.9 seconds Mode 21.9 secondsMean 52.3 seconds Mean 39.5 seconds

Three-Pointers Scored

2.56 50.55 2.72 71.08Mode 41.7 seconds Mode 60.1 secondsMean 44.9 seconds Mean 63.2 seconds

Free Throws Allowed

1.65 63.28 1.09 49.86Mode 36.0 seconds Mode 5.1 secondsMean 56.6 seconds Mean 48.3 seconds

Two-Pointers Allowed

1.65 47.69 1.82 35.38Mode 27.1 seconds Mode 22.8 secondsMean 42.6 seconds Mean 31.4 seconds

Three-Pointers Allowed

1.32 67.47 2.27 47.46Mode 23.1 seconds Mode 36.7 secondsMean 62.1 seconds Mean 42.0 seconds

table 2. Estimated parameters for Weibull inter-Event time distributions

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

15:

18 0

4 N

ovem

ber

2014

VOL. 25.2, 2012

36

gathered that represent competing risks in a recurrent event setting. However, a whole new world of analysis is now at our fingertips for this type of data.

further readingCook, R.J., and J.F. Lawless. 2010.

The statistical analysis of recurrent events. New York: Springer.

Crowder, M. 2001. Classical compet-ing risks. Boca Raton, FL: Chap-man and Hall/CRC.

Kalbfleisch, J. D., and R. Prentice. 2002. The statistical analysis of failure time data. Hoboken, New Jersey: John Wiley & Sons.

Lan, K. K., and J. M. Lachin. 1995. Martingales without tears. Lifetime Data Analysis 1(4): 361–375.

About the Authorlaura taylor is an assistant professor of statistics in the department of mathematics and statistics at Elon University in North Carolina. Her research interests include competing risks and recurrent events.

Peña, E. A., R. L. Strawderman, and M. Hollander. 2001. Nonpara-metric estimation with recur-rent event data. Journal of the American Statistical Association 96(456):1299–1315.

Proschan, F. 1963. Theoretical explanation of observed decreas-ing failure rate. Technometrics 5(3):375–383

Basketball Data, www.espn.com, gameid = 310780221 and 310690221.

Figure 3. Estimated inter-event time distributions for points scored and points allowed based on type of goal for Butler versus Pittsburgh (solid line) and UCONN versus Pittsburgh (dashed line)

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

15:

18 0

4 N

ovem

ber

2014


Recommended