Probability of Winning at Tennis

Embed Size (px)

DESCRIPTION

mineração de dados

Citation preview

  • Probability of Winning at Tennis I. Theory and Data

    By Paul K. Newton and Joseph B. Keller

    The probability of winning a game, a set, and a match in tennis are computed,based on each players probability of winning a point on serve, which weassume are independent identically distributed (iid) random variables. Bothtwo out of three and three out of five set matches are considered, allowing a13-point tiebreaker in each set, if necessary. As a by-product of these formulas,we give an explicit proof that the probability of winning a set, and hence amatch, is independent of which player serves first. Then, the probability of eachplayer winning a 128-player tournament is calculated. Data from the 2002 U.S.Open and Wimbledon tournaments are used both to validate the theory as wellas to show how predictions can be made regarding the ultimate tournamentchampion. We finish with a brief discussion of evidence for non-iid effects intennis, and indicate how one could extend the current theory to incorporatesuch features.

    1. Introduction

    We wish to calculate the probability that one player, A, wins a tennis matchagainst another player B. It is not enough to know the rankings of A and B,because there is no unambiguous way to translate rankings into probabilitiesof winning [1, 2]. However, it does suffice to know the probability pRA that A

    Address for correspondence: Paul K. Newton, Department of Aerospace and Mechanical Engineeringand Department of Mathematics, University of Southern California, Los Angeles, CA 90089-1191;e-mail: [email protected]

    STUDIES IN APPLIED MATHEMATICS 114:241269 241C 2005 by the Massachusetts Institute of TechnologyPublished by Blackwell Publishing, 350 Main Street, Malden, MA 02148, USA, and 9600 GarsingtonRoad, Oxford, OX4 2DQ, UK.

  • 242 P. K. Newton and J. B. Keller

    wins a rally when A serves, and the probability pRB that B wins a rally whenB serves. Such probabilities have been used to calculate the probability ofwinning a game in other racquet sports, such as racquetball [3], squash [4],and badminton [5]. Models of this type for tennis were first considered by Hsiand Burych [6], followed by Carter and Crews [7], and Pollard [8]. All ofthese analyses, including ours, treat points in tennis as independent identicallydistributed (iid) random variables, hence pRA and p

    RB are taken as constant

    throughout a match. A recent statistical analysis of 4 years of Wimbledon data[9] shows that although points in tennis are not iid, for most purposes thisis not a bad assumption as the divergence from iid is small. Other aspectsof tennis that have been analyzed using probabilistic models include optimalserving strategies [10], the efficiency of various scoring systems [11], and thequestion of which is the most important point [12]. Statistical methods havealso been used to study the effects of new balls [13], service dominance [14],and the probabilities of winning the final set of a match [15].

    Our formulation unifies and extends some of the previous treatments by theuse of hierarchical recurrence relations whose solutions yield the probabilitythat A wins a game, a set, or a match against B in terms of pRA and p

    RB . We then

    calculate the probability that a player in a 128 player single eliminationtournament reaches the second, third, . . . , or final round, and the probabilitythat a player who has reached the nth round will win the tournament. We alsoprovide an explicit proof, based on the solutions of our recurrence relations,that the probability of winning a set or match does not depend on which playerserves first.

    Of course the probability pRA that A wins a rally on serve depends upon theopponent B as well as upon A. If data are not available for A serving to B, thendata for A playing against players similar to B can be used. We illustratethis point with data from the 2002 U.S. Open Mens and Womens SinglesTournaments, and from the 2002 Wimbledon Mens and Womens SinglesTournaments. The data, shown in Tables 1 and 2, and in Figure 1, agree wellwith our theoretical calculation of pGA , the probability that A wins a gamewhen A serves. In a companion paper (part II), we will compare the theorywith Monte Carlo simulations.

    A game in tennis is played with one player serving. The game is won by thefirst player to score four or more points and to be at least two points ahead ofthe other player. In a set, the players serve alternate games until a player wins atleast six games and is ahead by at least two games. If the game score reaches66, a 13-point tiebreaker is used to determine who wins the set, with the playerwho started serving the set serving the first point of the tiebreaker.1 Then, the

    1In the U.S. Open, a tiebreaker is used in every set, whereas in Wimbledon, in the French Open, and inthe Australian Open, tiebreakers are not used in the third set of a two out of three set match (womensformat), or the fifth set in a three out of five set match (mens format).

  • Probability of Winning at Tennis 243

    Table 1Data for the Semifinalists in the 2002 U.S. Open Tournament

    Player A B C D E F G

    WomenS. Williams 240 349 52 57 0.69 0.91 0.89V. Williams 270 428 56 70 0.63 0.8 0.79L. Davenport 206 301 45 53 0.68 0.85 0.88A. Mauresmo 287 457 58 75 0.63 0.77 0.79

    MenP. Sampras 573 781 124 130 0.73 0.95 0.93A. Agassi 443 676 96 110 0.66 0.87 0.85L. Hewitt 436 654 91 107 0.67 0.85 0.86S. Schalken 519 768 107 119 0.68 0.9 0.88

    Column A: points won on serve; Column B: total points served; Column C:games won on serve; Column D: total games served; Column E: empiricalprobability pRA of winning a rally on serve = A/B; Column F: empiricalprobability pGA of winning a game on serve = C/D; Column G: theoreticalprobability pGA of winning a game on serve, given by (5), with p

    RA from

    Column E.

    Table 2Data for the Semifinalists in the 2002 Wimbledon Tournament

    Player A B C D E F G

    WomenS. Williams 276 390 57 64 0.71 0.89 0.91V. Williams 273 352 51 62 0.67 0.82 0.86J. Henin 252 427 48 66 0.59 0.73 0.71A. Mauresmo 241 378 50 57 0.64 0.88 0.81

    MenL. Hewitt 450 646 96 107 0.70 0.90 0.90D. Nalbandian 516 847 94 128 0.61 0.73 0.76T. Henman 457 683 92 110 0.67 0.84 0.86X. Malisse 483 721 101 114 0.67 0.89 0.86

    Column A: points won on serve; Column B: total points served; Column C:games won on serve; Column D: total games served; Column E: empiricalprobability pRA of winning a rally on serve = A/B; Column F: empiricalprobability pGA of winning a game on serve = C/D; Column G: theoreticalprobability pGA of winning a game on serve, given by (5), with p

    RA from

    Column E.

  • 244 P. K. Newton and J. B. Keller

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    pA

    R

    pA

    G

    2002 Wimbledon semifinalists

    2002 US Open semifinalists

    First round opponents (women)First round opponents (men)

    First round opponents (women)

    First round opponents (men)

    Figure 1. The probability pGA of A winning a game when A serves, i.e., of holdingserve, as a function of pRA based on (6). The open circles correspond to data from eightsemifinalists in the 2002 U.S. Open Mens and Womens Singles Tournaments and the openstars correspond to data from eight semifinalists in the 2002 Wimbledon Mens and WomensSingles Tournaments. The four left most data points represent the combined data from thesemifinalists first round opponents in each tournament.

    players alternate serves, each serving two consecutive points, until someonewins at least seven points, and is ahead by at least two points. The winner ofthe tiebreaker wins the set with seven games to the opponents six games. Towin a match, a player must win two out of three sets (womens format), or winthree out of five sets (mens format), with the two players serving alternategames throughout the match. The initial server in the match is determined by acoin toss, with the winner given the choice of serving first or receiving first.

    2. Probability of winning a game

    Player A can win a game against player B by a score of (4, 0), (4, 1) or (4,2), or else the score can become (3, 3), called deuce. Then, A can win by

  • Probability of Winning at Tennis 245

    getting two points ahead of B, with a score of (n + 5, n + 3) with n 0. Tocalculate the probability pGA that A wins a game when A serves, we assumethat pRA is the probability that A wins a rally when A serves, and set q

    RA =

    1 pRA , qGA = 1 pGA . We also denote by pGA (i , j) the probability that thescore reaches i points for A and j points for B when A serves. Upon summingthe probabilities of the different ways in which A can win, we get

    pGA =2

    j=0pGA (4, j) + pGA (3, 3)

    n=0

    pDGA (n + 2, n). (1)

    Here, pDGA (n + 2, n) is the probability that A wins by scoring n + 2 while Bscores n after deuce has been reached, with A serving. It is given by

    pDGA (n + 2, n) =n

    j=0

    (pRAq

    RA

    ) j(q RA p

    RA

    )n j n!j!(n j)!

    (pRA

    )2= (pRA)2[pRAq RA ]n2n. (2)

    Upon using (2) in (1), and summing the geometric series, we get

    pGA =2

    j=0pGA (4, j) + pGA (3, 3)

    (pRA

    )2[1 2pRAq RA

    ]1. (3)

    Elementary combinatorial analysis yields

    pGA (4, 0) =(

    pRA)4

    , pGA (4, 1) = 4(

    pRA)4

    q RA , pGA (4, 2) =

    5 42

    (pRA

    )4(q RA

    )2,

    pGA (3, 3) =6!

    (3!)2(

    pRAqRA

    )3. (4)

    Now using (4) in (3) gives the probability that A wins a game when A serves,i.e., that A holds serve:

    pGA =(

    pRA)4[

    1 + 4q RA + 10(q RA

    )2] + 20(pRAq RA )3(pRA)2[1 2pRAq RA ]1. (5)This equation agrees with that given in [7]. Figure 1 shows pGA as a function ofpRA , based upon (5). The open circles in the figure are data for the semifinalistsin the 2002 U.S. Open Mens and Womens Singles Tournaments, shown inTable 1, while the stars are data for the semifinalists in the 2002 WimbledonMens and Womens Singles Tournaments shown in Table 2. The left most fourpoints are totals for their first round opponents in both tournaments. They alllie close to the theoretical curve.

  • 246 P. K. Newton and J. B. Keller

    3. Probability of winning a set

    3.1. Recursion equations

    Let pSA denote the probability that player A wins a set against player B, with Aserving first, and qSA = 1 pSA. To calculate pSA in terms of pGA and pGB , wedefine pSA(i , j) as the probability that in a set, the score becomes i games forA, j games for B, with A serving initially. Then,

    pSA =4

    j=0pSA(6, j) + pSA(7, 5) + pSA(6, 6)pTA . (6)

    Here, pTA is the probability that A wins a 13-point tiebreaker with A servinginitially, and qTA = 1 pTA .

    To calculate pSA(i , j), needed in (6), we use the following recursion formulasand initial conditions:

    For 0 i , j 6:if i 1 + j is even: pSA(i, j) = pSA(i 1, j)pGA + pSA(i, j 1)qGA

    omit i 1 term if j = 6, i 5;omit j 1 term if i = 6, j 5 (7)

    if i 1 + j is odd: pSA(i, j) = pSA(i 1, j)qGB + pSA(i, j 1)pGBomit i 1 term if j = 6, i 5;omit j 1 term if i = 6, j 5 (8)

    Initial conditions:

    pSA(0, 0) = 1; pSA(i, j) = 0 if i < 0, or j < 0. (9)In terms of pSA (6, 5) and p

    SA(5, 6), we have

    pSA(7, 5) = pSA(6, 5)qGB ; pSA(5, 7) = pSA(5, 6)pGB . (10)The explicit solution of (7)(10) is given in the Appendix.

    3.2. Probability of winning a tiebreaker

    To calculate pTA in terms of pRA and p

    RB , we define p

    TA (i , j) to be the probability

    that the score becomes i for A, j for B in a tiebreaker with A serving initially.Then,

    pTA =5

    j=0pTA(7, j) + pTA(6, 6)

    n=0

    pTA(n + 2, n). (11)

  • Probability of Winning at Tennis 247

    Because the sequence of serves in a tiebreaker is A, BB, AA, BB, etc., we have

    pTA(n + 2, n) =n

    j=0

    (pRA p

    RB

    ) j(q RA q

    RB

    )n j n!j!(n j)! p

    RAq

    RB

    = (pRA pRB + q RA q RB )n pRAq RB . (12)Using (12) in (11) and summing yields

    pTA =5

    j=0pTA(7, j) + pTA(6, 6)pRAq RB

    [1 pRA pRB q RA q RB

    ]1(13)

    To calculate pTA (i , j), we use the recursion formulas:For 0 i , j 7:

    if i 1 + j = 0, 3, 4, . . . , 4n 1, 4n, . . .pTA(i, j) = pTA(i 1, j)pRA + pTA(i, j 1)q RA

    omit j 1 term if i = 7, j 6omit i 1 term if j = 7, i 6 (14)

    if i 1 + j = 1, 2, 5, 6, . . . , 4n + 1, 4n + 2, . . .pTA(i, j) = pTA(i 1, j)q RB + pTA(i, j 1)pRB

    omit j 1 term if i = 7, j 6omit i 1 term if j = 7, i 6 (15)

    Initial conditions:

    pTA(0, 0) = 1; pTA(i, j) = 0 if i < 0, or j < 0. (16)The solution of (14)(16) is given in the Appendix.

    Next we calculate pTA by using the solution of (14)(16) in (13). Now wecan calculate pSA by using the solution of (7)(9), and (10), with the result forpTA , in (6).

    Figure 2 shows the probability of player A winning a set against player Bplotted as a function of pRA [0, 1] for the full range values of pRB in incrementsof 0.1. The data shown are compiled from the 2002 U.S. Open Mens Singlesevent. Of the 117 completed matches played, there were 9 matches in which aplayer (designated player B) had a value of pRB = 0.50 0.01, 33 matches inwhich pRB = 0.60 0.01, and 20 matches with pRB = 0.70 0.01. Because eachmatch involves three, four, or five sets, it is necessary to combine data fromseveral matches to get meaningful statistics. Hence, each data point shown inthe figure represents a compilation of several matches grouped according tocorresponding values of pRA . Each of the three data points associated with the

  • 248 P. K. Newton and J. B. Keller

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1p = .50 .01

    BR +-

    p = .60 .01BR +-

    p = .70 .01BR +-

    p A

    S

    pAR

    p

    = .1

    RB p

    =

    .2R

    B p

    = .3

    RB p

    =

    .4R B p

    = .5

    R B p

    = .6

    R B p

    = .7

    R B p

    = .8

    R B p

    = .9

    R B

    Figure 2. The probability pSA of player A winning a set plotted as a function of pRA for

    various values of pRB . Compiled data from the 2002 U.S. Open Mens Singles event are shownfor the values pRB = 0.50 0.01, pRB = 0.60 0.01, and pRB = 0.70 0.01.

    curve marked pRB = 0.50 represents three matches, each of the seven data pointsassociated with the curve marked pRB = 0.60 represents approximately fivematches, while each of the three data points associated with the curve markedpRB = 0.70 represents a compilation of approximately seven matches. Giventhe relatively small number of sets underlying each of the data points, thedata fits the theoretical curves reasonably well. Figure 3 shows the probabilityof player A winning a tiebreaker against player B plotted as a function ofpRA [0, 1] for the full range values of pRB in increments of 0.1.

    3.3. Serving or receiving first

    In this section, we prove that there is no theoretical advantage to serving first byshowing that the probability of player A winning the set when serving first, pSA, isequal to his probability of winning the set when receiving first, qSB. For this, weneed formula (6) for pSA, along with the corresponding formula for q

    SB given by

  • Probability of Winning at Tennis 249

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    pAR

    p A

    T

    p

    = .9

    RBp

    = .8

    RBp

    =

    .7R

    Bp

    = .6

    RBp

    = .5

    RBp

    =

    .4R

    Bp

    = .3

    RBp

    = .2

    RBp

    = .1

    RB

    Figure 3. The probability pTA of player A winning a tiebreaker plotted as a function of pRA for

    various values of pRB .

    q SB =4

    j=0pSB( j, 6) + pSB(5, 7) + pSB(6, 6)qTB . (17)

    We obtain the terms pSB( j , i) in (17) from pSA(i , j) given in the Appendix, by

    interchanging pGA qGB , pGB qGA . From (A.1) and (A.6) it is immediate thatpSA(6, 0) = pSB(0, 6) (18)

    pSA(7, 5) = pSB(5, 7). (19)It is also clear from (A.7) that

    pSA(6, 6) = pSB(6, 6). (20)Thus, it remains to show that

    4j=1

    pSA(6, j) =4

    j=1pSB( j, 6) (21)

  • 250 P. K. Newton and J. B. Keller

    and that

    pTA = qTB . (22)To prove (21), we show that

    pSA(6, 1) + pSA(6, 2) = pSB(1, 6) + pSB(2, 6) (23)and

    pSA(6, 3) + pSA(6, 4) = pSB(3, 6) + pSB(4, 6). (24)By using formulas (A.2)(A.5), and replacing qGA = 1 pGA , qGB = 1 pGB ,we can write

    pSA(6, 2n 1) + pSA(6, 2n) =6+2ni=0

    6+2nj=0

    aSij (n)(

    pGA)i(

    pGB) j

    (25)

    pSB(2n 1, 6) + pSB(2n, 6) =6+2ni=0

    6+2nj=0

    bSij(n)(

    pGA)i(

    pGB) j

    (26)

    for n = 1, 2. Then, it can be shown that the coefficients of each are equal,i.e., aSij(n) = bSij(n). The values are listed in the Appendix. Figure 4 showsthe probability of obtaining each of the scores that are independent of whichplayer serves first for the case of evenly matched players.

    To prove that pTA = qTB , we use the formula (11) for pTA and the correspondingone for qTB

    qTB =5

    j=0pTB ( j, 7) + pTB (6, 6)

    n=0

    pTB (n, n + 2). (27)

    We obtain the terms pTB ( j , i) in (27) from pTA (i , j) given in the Appendix, by

    interchanging pRA qRB , pRB qRA . From (A.14) it is clear that pTA (6, 6) =pTB (6, 6). Furthermore, from the symmetry under exchanging p

    RA qRB , pRB

    qRA in (12), we have that

    pTA(n + 2, n) = pTB (n, n + 2). (28)Thus, it remains to show that

    5j=0

    pTA(7, j) =5

    j=0pTB ( j, 7). (29)

    To prove this, we show that

    pTA(7, 0) + pTA(7, 1) = pTB (0, 7) + pTB (1, 7), (30)

    pTA(7, 2) + pTA(7, 3) = pTB (2, 7) + pTB (3, 7), (31)

  • Probability of Winning at Tennis 251

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    pAR

    pBR

    (a)(b)

    (c)

    (d)

    (e)

    Figure 4. Set scores that are independent of which player serves first, plotted for two equalplayers pRA = pRB . (a) pSA(6, 0), (b) pSA(6, 1) + pSA(6, 2), (c) pSA(6, 3) + pSA(6, 4), (d) pSA(7, 5),and (e) pSA(6, 6).

    pTA(7, 4) + pTA(7, 5) = pTB (4, 7) + pTB (5, 7). (32)By using formulas (A.8)(A.13) and replacing qRA = 1 pRA , qRB = 1 pRB , wecan write

    pTA(7, 2n) + pTA(7, 2n + 1) =4

    i=0

    4j=0

    aTij (n)(

    pRA)i(

    pRB) j

    (33)

    pTB (2n, 7) + pTB (2n + 1, 7) =4

    i=0

    4j=0

    bTij (n)(

    pRA)i(

    pRB) j

    (34)

    for n = 0, 1, 2. Then, it can be shown that the coefficients are equal, i.e.,aTij (n) = bTij (n). The values are listed in the Appendix. Figure 5 shows theprobability of obtaining each of the tiebreaker scores that are independent ofwhich player serves first, for equally matched players.

  • 252 P. K. Newton and J. B. Keller

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    pAR

    pBR

    (a)

    (b)(c)

    (d)

    Figure 5. Tiebreaker scores that are independent of which player serves first, plotted for twoequal players pRA = pRB . (a) pTA (7, 0) + pTA (7, 1), (b) pTA (7, 2) + pTA (7, 3), (c) pTA (7, 4) +pTA (7, 5), and (d) p

    TA (6, 6).

    The question of whether to serve or receive first has received some attentionin the literature. In an interesting combinatorial analysis of Kingston [16](followed by a note [17]), a simplified scoring system (which he calls a shortset) is considered in which player A serves the first game of a match consistingof the best N of 2N 1 games. His striking result is that it does not matterwhether the rules are such that the players alternate serves after each game, orwhether the winner of the previous game continues to serve the next game.In either case, player A has the same probability of winning. At the end ofthe article, he asks how many games need to be played to give two equalplayers a reasonably equal chance of winning, whoever starts serving. As aconsequence of the central limit theorem, player As (approximate) probabilityof winning a short set is 12 + 12 (pRA 12 )[pRA(1 pRA)(N 1)]1/2. Figure 2in his paper shows the slow convergence to 12 as N , giving player A adistinct advantage, for finite N , if he serves first and pRA > 0.5. Thus, for bestN of 2N 1 scoring, there is a theoretical advantage to serving first. For

  • Probability of Winning at Tennis 253

    tennis scoring, the paper of Pollard [8] considers both classical scoring (notiebreakers) and tiebreaker scoring, and implicit in his calculations (see, forexample, his Tables 2 and 3) is the fact that pSA = qSB, although the result is notproven. There are other ways of proving and generalizing the result that do notrely on the explicit solutions for pSA and q

    SB as our proof does. In fact, one can

    prove that as long as the scoring system is such that the number of gamesserved by player A minus the number of games served by player B is 1, 0,or 1, there is no advantage or disadvantage to serving first. Such scoringsystems are termed service neutral and are discussed in [18].

    4. Probability of winning a match

    We now calculate pMA , the probability that player A wins a match against playerB, with player A serving initially, and qMA = 1 pMA . To do so we definepMAB(i , j) to be the probability that in a match, the score becomes i sets for Aand j sets for B, with A serving initially and B serving finally. We definepMAA(i , j) similarly, but with A serving initially and finally.

    To formulate recursion equations for pMAB(i , j) and pMAA(i , j), we introduce

    pSAB, pSAA, p

    SBA, and p

    SBB. Here, p

    SXY is the probability that X wins a set when X

    serves the first game and Y serves the last game, where X and Y are A or B.To get an expression for pSAA we note that when A serves the first and last

    games, the total number of games must be odd. Then, by restricting the rightside of (6) to odd numbers of games, we get

    pSAA =j=1,3

    pSA(6, j) + pSA(6, 6)pTA . (35)

    Similarly when A serves the first game and B serves the last game, the totalnumber of games is even. For even numbers of games, the right side of (6) yields

    pSAB =

    j=0,2,4pSA(6, j) + pSA(7, 5). (36)

    Then, (6) is written

    pSA = pSAA + pSAB. (37)We also define qSAA and q

    SAB as

    q SAA =j=1,3

    pSA( j, 6) + pSA(6, 6)qTA , (38)

    q SAB =

    j=0,2,4pSA( j, 6) + pSA(5, 7). (39)

  • 254 P. K. Newton and J. B. Keller

    Tab

    le3

    Pro

    babi

    lity

    pM Aof

    Pla

    yer

    AW

    inni

    nga

    Mat

    chof

    Thr

    eeS

    ets

    out

    ofFi

    ve

    pR A

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    0.0

    1.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    000.

    10.

    0000

    0.50

    000.

    9060

    0.99

    561.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    0.2

    0.00

    000.

    0940

    0.50

    000.

    9136

    0.99

    811.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    000.

    30.

    0000

    0.00

    450.

    0864

    0.50

    000.

    9380

    0.99

    931.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    0.4

    0.00

    000.

    0000

    0.00

    190.

    0621

    0.50

    000.

    9513

    0.99

    951.

    0000

    1.00

    001.

    0000

    1.00

    000.

    50.

    0000

    0.00

    000.

    0000

    0.00

    070.

    0487

    0.50

    000.

    9513

    0.99

    931.

    0000

    1.00

    001.

    0000

    PR B

    0.6

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    050.

    0487

    0.50

    000.

    9380

    0.99

    811.

    0000

    1.00

    000.

    70.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    070.

    0621

    0.50

    000.

    9136

    0.99

    561.

    0000

    0.8

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    190.

    0864

    0.50

    000.

    9060

    1.00

    000.

    90.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    450.

    0940

    0.50

    001.

    0000

    1.0

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    V

    alue

    sof

    pR Aar

    eal

    ong

    the

    top

    row

    and

    valu

    esof

    pR Bar

    edo

    wn

    the

    left

    colu

    mn.

    Ind

    icat

    esth

    atth

    em

    atch

    cann

    oten

    dfo

    rth

    ese

    valu

    es.

  • Probability of Winning at Tennis 255

    Then,

    q SA = q SAA + q SAB. (40)To get pSBA, p

    SBB, q

    SBA, and q

    SBB, we interchange A and B in (35)(40). Note that

    because pSA + qSA = 1 and pSB + qSB = 1, we havepSAA + q SAA + pSAB + q SAB = 1, (41)

    pSBB + q SBB + pSBA + q SBA = 1. (42)Now we can write the recursion equations satisfied by pMAB(i , j) and

    pMAA(i , j) as follows, for i + j > 1:pMAB(i, j) = pMAB(i 1, j)pSAB + pMAA(i 1, j)q SBB

    + pMAB(i, j 1)q SAB + pMAA(i, j 1)pSBB, (43)

    pMAA(i, j) = pMAB(i 1, j)pSAA + pMAA(i 1, j)q SBA+ pMAB(i, j 1)q SAA + pMAA(i, j 1)pSBA. (44)

    The initial conditions are

    pMAA(0, 0) = 1; pMAA(i, j) = 0 if i < 0 or j < 0 (45)

    pMAB(0, 0) = 1; pMAB(i, j) = 0 if i < 0 or j < 0 (46)

    pMAB(1, 0) = pSAB; pMAB(0, 1) = q SAB; pMAA(1, 0) = pSAA; pMAA(0, 1) = q SAA.(47)

    For the mens format of three sets out of five, (43)(47) must be solved fori , j = 0, 1, 2, 3. When j = 3, the i 1 terms must be omitted; when i = 3,the j 1 terms must be omitted. The probability that player A wins a threeout of five set match when serving first is given by

    pMA =2

    j=0

    [pMAA(3, j) + pMAB(3, j)

    ]. (48)

    For a match of two sets out of three, (35) and (36) must be solved for i ,j = 0, 1, 2. When j = 2, the i 1 terms must be omitted; when i = 2, thej 1 terms must be omitted. Then, the probability that player A wins a twoout of three set match when serving first is

    pMA =1

    j=0

    [pMAA(2, j) + pMAB(2, j)

    ]. (49)

  • 256 P. K. Newton and J. B. Keller

    By using the solutions of (43) and (44) for pMAA(2, j) and pMAB(2, j) and

    taking advantage of (37) and (40), we can write (49) as

    pMA = pSAAq SB + pSAB pSA + pSAA pSBAq SB + pSAA pSBB pSA + pSABq SAAq SB+ pSABq SAB pSA + q SAAq SBAq SB + q SAAq SBB pSA + q SAB pSAAq SB + q SAB pSAB pSA.

    (50)

    Note that because the probability of winning a set is independent of whichplayer serves first, the above formula (50) reduces to

    pMA =(

    pSA)2 + 2(pSA)2 pSB (51)

    for the two out of three set format, and

    pMA =(

    pSA)3 + 3(pSA)3 pSB + 6(pSA)3(pSB)2 (52)

    for the three out of five set format.Table 3 shows pMA for a match of three sets out of five based upon (48), and

    Table 4 shows pMA for a match of two sets out of three based upon (40). In bothcases the results are shown as functions of pRA and p

    RB , ranging from 0 to 1 at

    intervals of 0.1. Figure 6 shows the data from the 2002 U.S. Open MensSingles event as well as the theoretical curves for pGA , p

    SA, and p

    MA (three out of

    five set format) corresponding to the value pRB = 0.60. To obtain meaningfulstatistics for the three data points associated with the pMA curve, the 33 matcheswere grouped in clusters of approximately 11 matches per cluster.

    5. Probability of winning a tournament

    5.1. The 128-player tournament

    We now consider a single elimination tournament of 128 = 27 players numberedi = 1, . . . , 128. We assume that we know the probability pMij for player i todefeat player j in a match. We introduce the column vector of probabilitiesp(n) R1128;

    p(n) =

    p(n)1p(n)2p(n)3

    ...

    p(n)128

    . (53)

    Here, p(n)i is the conditional probability that player i wins a match in the nthround, provided that he or she survives to that round of the tournament. From(48) or (49), we know pMij , the probability that player i beats player j, which wewrite more simply as Pij.

  • Probability of Winning at Tennis 257

    Tab

    le4

    Pro

    babi

    lity

    pM Aof

    Pla

    yer

    AW

    inni

    nga

    Mat

    chof

    Two

    Set

    sou

    tof

    Thr

    ee

    pR A

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    0.0

    1.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    000.

    10.

    0000

    0.50

    000.

    8539

    0.98

    200.

    9995

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    0.2

    0.00

    000.

    1461

    0.50

    000.

    8624

    0.98

    980.

    9999

    1.00

    001.

    0000

    1.00

    001.

    0000

    1.00

    000.

    30.

    0000

    0.01

    800.

    1376

    0.50

    000.

    8909

    0.99

    471.

    0000

    1.00

    001.

    0000

    1.00

    001.

    0000

    0.4

    0.00

    000.

    0005

    0.01

    030.

    1091

    0.50

    000.

    9079

    0.99

    611.

    0000

    1.00

    001.

    0000

    1.00

    000.

    50.

    0000

    0.00

    000.

    0001

    0.00

    530.

    0922

    0.50

    000.

    9079

    0.99

    470.

    9999

    1.00

    001.

    0000

    PR B

    0.6

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    390.

    0922

    0.50

    000.

    8909

    0.98

    980.

    9995

    1.00

    000.

    70.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    530.

    1091

    0.50

    000.

    8624

    0.98

    201.

    0000

    0.8

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0001

    0.01

    030.

    1376

    0.50

    000.

    8539

    1.00

    000.

    90.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0005

    0.01

    800.

    1461

    0.50

    001.

    0000

    1.0

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    0.00

    000.

    0000

    V

    alue

    sof

    pR Aar

    eal

    ong

    the

    top

    row

    and

    valu

    esof

    pR Bar

    edo

    wn

    the

    left

    colu

    mn.

    Ind

    icat

    esth

    atth

    em

    atch

    cann

    oten

    dfo

    rth

    ese

    valu

    es.

  • 258 P. K. Newton and J. B. Keller

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    pAR

    pAG

    p = .60 .01BR +-

    pAS p

    AM

    Game data

    Set data

    Match data

    2002 US Open Men's Singles

    Figure 6. Theoretical curves for pGA (dotted), pSA (dashed), and p

    MA (solid) corresponding to

    values pRB = 0.60. Compiled data from the 2002 U.S. Open Mens Singles event are shown forall matches in which pRB = 0.60 0.01.

    p(n) satisfies the recursion formula

    p(0) =

    111...1

    , p(n) = Pnp(n1) (n = 1, . . . , 6). (54)

    Here, Pn is a 128 128 matrix with block diagonal structure made up of 27nblocks. We label them P(k)n , 1 k 27n , and then Pn is given by

    Pn =

    P(1)n 0 0 . . . 0

    0 P(2)n 0 . . . 0

    0 0 P(3)n . . ....

    ......

    ......

    ...

    0 0 0 . . . P(27n)

    n

    . (55)

  • Probability of Winning at Tennis 259

    P(k)n is a 2n 2n off-diagonal block matrix:

    P(k)n =[

    0 P(n,k),

    P(n,k), 0

    ], (56)

    where = (k 1)2n + 1, = k2n . The P(n,k), are 2n1 2n1 matrices of theform

    P(n,k), =

    P,+12n1 . . . P,1 P,P+1,+12n1 . . . P+1,1 P+1,

    ......

    ......

    P+2n11,+12n1 P+2n11,1 P+2n11,

    . (57)

    The entries of this matrix, Pij, are obtained from (48) or (49).As an example, for n = 1, (55) becomes

    P1 =

    P(1)1 0 0 . . . 0

    0 P(2)1 0 . . . 0

    0 0 P(3)1 . . ....

    ......

    ......

    ...

    0 0 0 . . . P(64)1

    . (58)

    P(k)1 is a 2 2 matrix:

    P(k)1 =[

    0 P2k1,2kP2k,2k1 0

    ]. (59)

    Explicitly (59) yields

    P(1)1 =[

    0 P12P21 0

    ], P(2)1 =

    [0 P34

    P43 0

    ], . . . , P(64)1 =

    [0 P127,128

    P128,127 0

    ].

    (60)

    The probability that player i ultimately becomes the tournament champion,which we denote pTCi , is the product of the conditional probabilities of winningeach of the rounds. In vector form, this is given by

    pT C

    pT C1pT C2pT C3

    ...

    pT C128

    =

    7n=1 p

    (n)17

    n=1 p(n)27

    n=1 p(n)3

    ...7n=1 p

    (n)128

    . (61)

  • 260 P. K. Newton and J. B. Keller

    The factors in the last column are obtained by solving (54). Note that thecomponents of the vector pTC must sum to unity.

    5.2. Predicting the fate of the semifinalists

    Suppose that after the quarterfinal round, we wish to predict the probability ofeach of the four semifinalists becoming the tournament champion. We usethe preceding recursion method, introducing the vectors p(0), p(1), and p(2) ofprobabilities of winning the quarterfinal, semifinal, and final round

    p(0) =

    1

    1

    1

    1

    , p(n) =

    p(n)1p(n)2p(n)3p(n)4

    , (n = 1, 2). (62)

    The matrices P1 and P2 are given by

    P1 =

    0 P12 0 0

    P21 0 0 0

    0 0 0 P340 0 P43 0

    , (63)

    P2 =

    0 0 P13 P140 0 P23 P24

    P31 P32 0 0

    P41 P42 0 0

    . (64)

    The probability that player i wins a semifinal match is the ith component of

    p(1) = P1p(0) =

    P12P21P34P43

    . (65)

    The probability that player i wins the final match if he or she plays in it is theith component of

    p(2) = P2p(1) =

    P13 P34 + P14 P43P23 P34 + P24 P43P31 P12 + P32 P21P41 P12 + P42 P21

    . (66)

  • Probability of Winning at Tennis 261

    The vector of probabilities that each semifinalist wins the tournament isobtained by using (65) and (66) in (61):

    pT C =

    P12(P13 P34 + P14 P43)P21(P23 P34 + P24 P43)P34(P31 P12 + P32 P21)P43(P41 P12 + P42 P21)

    . (67)

    6. 2002 U.S. Open and Wimbledon data

    We now use the results of the 2002 U.S. Open and 2002 Wimbledon Singlesevents to show how the previous method can be applied to predict thetournament champion after the quarterfinal round (n = 5), based on theaccumulated data through this round. Let i (n) be the total number of pointswon on serve by player i in round n, and let i (n) be the total number ofpoints served by player i in round n. Then, the empirical probability of playeri winning a point on serve in round n is i (n)/ i (n). The correspondingprobability of winning a rally on serve in rounds 1n is

    pRi (n) =n

    j=1i ( j)

    / nj=1

    i ( j). (68)

    We use this with n = 5 in (5) for each player in the semifinals and thencompute their empirical probabilities of winning a match against any of theother remaining players. This allows us to compute the entries of the matricesP1 and P2 in (63), (64), and arrive at values for pTC in round n = 6 for each ofthe four semifinalists. To calculate pTC for the two finalists after the semifinalround match, we repeat the same steps for the two finalists, using (68) withn = 6. The same method of calculating pTC could be applied after round n = 1,and after each subsequent round as the tournament progresses to make runningprojections regarding tournament outcomes. Other forecasting methods whichallow point by point updates as the match unfolds are described in [19].

    6.1. Womens Tennis Association (WTA) data

    Figure 7 shows the 2002 U.S. Open Womens Singles Draw from the semifinalround. Under each player, we show the value of pRi (5), p

    Ri (6), and p

    Ri (7).

    Next to each players name is their empirical probability of winning theupcoming match, Pij, as well as their empirical probability of becomingthe tournament champion, pTCi . After the quarterfinal round matches, L.Davenport would have been the slight favorite to win the tournament (pTC2 =0.3599), followed by V. Williams (pTC4 = 0.3047), S. Williams (pTC1 =0.2872) and A. Mauresmo (pTC3 = 0.0482), while after the semifinal round

  • 262 P. K. Newton and J. B. Keller

    S. Williams P = .4582 p = .287212 1TC

    p (5) = 158/225 = .70221

    R

    L. Davenport P = .5418 p = .3599

    A. Mauresmo P = .2559 p = .0482

    V. Williams P = .7441 p = .3047TC

    21 2TC

    TC34 3

    43 4

    p (5) = 170/239 = .71132

    R

    p (5) = 237/374 = .63373

    R

    p (5) = 176/256 = .68754

    R

    p (6) = 208/301 = .69101

    R

    p (6) = 235/357= .65834

    R

    p (7) = 240/349 = .68771

    R

    S. Williams P = p = .652714 1TC

    V. Williams P = p = .347341 4TC

    S. Williams p = 11TC

    Figure 7. The probability Pij of each of the four semifinalists in the 2002 U.S. OpenWomens Singles tournament winning her match, and her probability pTCi of becoming thetournament champion.

    matches, S. Williams (pTC1 = 0.6527) was the favorite and ultimately won thetournament. Figure 8 shows the 2002 Wimbledon Womens Singles Draw fromthe semifinal round. Here, V. Williams (pTC1 = 0.4784) was the favorite to winthe tournament after the quarterfinal round match, followed by S. Williams(pTC4 = 0.3834), A. Mauresmo (pTC3 = 0.1233), and J. Henin (pTC2 = 0.0150),while S. Williams (pTC4 = 0.5866) was the favorite after the semifinal roundmatch and ultimately won the tournament.

    6.2. Association of Tennis Professionals (ATP) data

    Figure 9 shows the 2002 U.S. Open Mens Singles Draw. After the quarterfinalround matches, P. Sampras was the heavy favorite to win the tournament(pTC1 = 0.6747), followed by L. Hewitt (pTC4 = 0.1457), A. Agassi (pTC3 =0.0945), and S. Schalken (pTC2 = 0.0851). Sampras chances of winning thetournament increased after his semifinal round match (pTC1 = 0.8856) andhe ultimately won the tournament. Figure 10 shows the results from the2002 Wimbledon Mens Singles event. After their quarterfinal round matches,X. Malisse (pTC3 = 0.4573) was favored to win the tournament, followed byL. Hewitt (pTC1 = 0.3364), T. Henman (pTC2 = 0.1815), and D. Nalbandian(pTC4 = 0.0247). After the semifinal round matches, it was L. Hewitt, theultimate tournament champion, who was the heavy favorite (pTC1 = 0.8698).

  • Probability of Winning at Tennis 263

    = 162/230 = 0.7043p1R

    (5)

    V. Williams P = .8896 p = .478412 1TC

    J. Henin P = .1104 p = .015021 2TC

    A. Mauresmo P = .3184 p = .123334 3TC

    S. Williams P = .6816 p = .383443 4TC

    p (5) = 220/364 = .60442

    R

    p (5) = 212/317 = .66883

    R

    p (5) = 202/285 = .70884

    R

    V. Williams P = p = .413414 1TC

    S. Williams P = p = .586641 4TC

    S. Williams p = 14TC

    p (6) = 200/286 = .69931

    R

    p (6) = 232/323 = .71834

    R

    p (7) = 276/390 = .70774

    R

    Figure 8. The probability Pij of each of the four semifinalists in the 2002 WimbledonWomens Singles tournament winning her match, and her probability pTCi of becoming thetournament champion.

    P. Sampras P = .8257 p = .674712 1TC

    S. Schalken P = .1743 p = .085121 2TC

    A. Agassi P = .4386 p = .094534 3TC

    L. Hewitt P = .5614 p = .145743 4TC

    p (5) = 392/524 = .74811

    R

    p (5) = 447/655 = .68242

    R

    p (5) = 285/420 = .67863

    R

    p (5) = 370/537 = .68904

    R

    P. Sampras P = p = .885614 1TC

    A. Agassi P = p = .114441 4TC

    p (6) = 469/629 = .74561

    R

    p (6) = 365/551 = .66244

    R

    p (7) = 573/781 = .73371

    R

    P. Sampras p = 11TC

    Figure 9. The probability Pij of each of the four semifinalists in the 2002 U.S. Open MensSingles tournament winning his match, and his probability pTCi of becoming the tournamentchampion.

  • 264 P. K. Newton and J. B. Keller

    L. Hewitt P = .6039 p = .336412 1TC

    T. Henman P = .3961 p = .181521 2TC

    X. Malisse P = .8540 p = .457334 3

    TC

    D. Nalbandian P = .1460 p = .024743 4TC

    p (5) = 336/477 = .70441

    R

    p (5) = 405/590 = .68642

    R

    p (5) = 389/553 = .70343

    R

    p (5) = 389/614 = .63364

    R

    L. Hewitt P = p = .869814 1TC

    D. Nalbandian P = p = .130241 4TC

    L. Hewitt p = 11TC

    p (6) = 399/567 = .70371

    R

    p (6) = 477/758 = .62934

    R

    p (7) = 450/646 = .69661

    R

    Figure 10. The probability Pij of each of the four semifinalists in the 2002 Wimbledon MensSingles tournament winning his match, and his probability pTCi of becoming the tournamentchampion.

    7. Capturing non-iid effects

    There are several papers documenting effects that cannot be captured with theassumption that points are independent and identically distributed. For example,Magnus and Klassen [20] analyze 90,000 points played at Wimbledon, andfind evidence of a first game effect, i.e., that the first game of a match isthe hardest one to break. This indicates that it may be desirable to allow pGAand pGB to vary from game to game and perhaps depend on the specific pairof players who are competing. Jackson and Mosurski [21] give compellingevidence which indicates that points may not be independent. This includeswhat is commonly called the hot-hand phenomenon in which winning aprevious point, game, or set, increases ones chances of winning the next, andthe opposite of this, called the back-to-the-wall effect in which playingfrom behind can sometimes be a psychological advantage. From the analysisof Klassen and Magnus [9], one can assume that although these effectsmay be small when analyzing large heterogeneous data sets, they may bemore important when analyzing specific head-to-head match-ups between twoplayers, as, for example, the famous McEnroeBorg series of matches [21] inwhich a back-to-the-wall phenomenon seems to be present.

  • Probability of Winning at Tennis 265

    A more refined analysis than the one described in this paper could incorporatethese and other higher-order effects by allowing pRA and p

    RB to vary from point

    to point as the match unfolds, depending on the points importance [12]or by taking into consideration more detailed player characteristics such asrallying ability or strength of return of serve. For example, we could define theprobability that player A wins a point on serve as

    pRA = pRA + pRAB(i, j),(0 pRA 1

    )(69)

    where pRA is constant throughout the match, pRAB(i , j) represents player As

    probability of winning a point on serve against player B, when the score is ipoints for A and j points for B, and 1 is a small parameter reflectingthe fact that, in most cases, the deviation from iid is small. The goal thenwould be to calculate the corresponding formulas for game, set, and match foreach player, i.e., pGA , p

    SA, p

    MA , and p

    GB , p

    SB, p

    MB . The leading-order theory

    ( = 0) is the one described in this paper based on the iid assumption, whilehigher-order corrections could be treated perturbatively.

    Acknowledgments

    This work is supported by the National Science Foundation grants NSF-DMS9800797 and NSF-DMS 0203581. Useful comments and observations byJ. DAngelo and G.H. Pollard on an early draft of the manuscript are gratefullyacknowledged. The first author also thanks Andres Figueroa for skillfullyperforming Matlab calculations on the models developed in this manuscript aspart of a summer undergraduate research project.

    Appendix

    The solution of (7)(10) is

    pSA(6, 0) =(

    pGA qGB

    )3(A.1)

    pSA(6, 1) = 3(

    pGA)3

    qGA(qGB

    )3 + 3(pGA )4 pGB (qGB )2 (A.2)pSA(6, 2) = 12

    (pGA

    )3qGA p

    GB

    (qGB

    )3 + 6(pGA )2(qGA )2(qGB )4+ 3(pGA )4(pGB )2(qGB )2 (A.3)

    pSA(6, 3) = 24(

    pGA)3(

    qGA)2

    pGB(qGB

    )3 + 24(pGA )4qGA (pGB )2(qGB )2+ 4(pGA )2(qGA )3(qGB )4 + 4(pGA )5(pGB )3qGB (A.4)

  • 266 P. K. Newton and J. B. Keller

    pSA(6, 4) = 60(

    pGA)3(

    qGA)2(

    pGB)2(

    qGB)3 + 40(pGA )2(qGA )3 pGB (qGB )4

    + 20(pGA )4qGA (pGB )3(qGB )2 + 5pGA (qGA )4(qGB )5+ (pGA )5(pGB )4qGB (A.5)

    pSA(7, 5) = 100(

    pGA)3(

    qGA)3(

    pGB)2(

    qGB)4 + 100(pGA )4(qGA )2(pGB )3(qGB )3

    + 25(pGA )2(qGA )4 pGB (qGB )5 + 25(pGA )5qGA (pGB )4(qGB )2+ pGA

    (qGA

    )5(qGB

    )6 + (pGA )6(pGB )5qGB . (A.6)To obtain pSA(i , j) from p

    SA( j , i), we interchange p

    GA qGA and pGB qGB in

    (A.1)(A.6). Finally, pSA(6, 6) in (6) is given by

    pSA(6, 6) = 1 [

    4i=0

    (pSA(i, 6) + pSA(6, i)

    ) + pSA(7, 5) + pSA(5, 7)]

    . (A.7)

    The solution of (14)(16) yields:

    pTA(7, 0) =(

    pRA)3(

    q RB)4

    (A.8)

    pTA(7, 1) = 3(

    pRA)3

    q RA(q RB

    )4 + 4(pRA)4 pRB(q RB )3 (A.9)pTA(7, 2) = 16

    (pRA

    )4q RA p

    RB

    (q RB

    )3 + 6(pRA)5(pRB)2(q RB )2+ 6(pRA)3(q RA )2(q RB )4 (A.10)

    pTA(7, 3) = 40(

    pRA)3(

    q RA)2

    pRB(q RB

    )4 + 10(pRA)2(q RA )3(q RB )5+ 4(pRA)5(pRB)3(q RB )2 + 30(pRA)4q RA (pRB)2(q RB )3 (A.11)

    pTA(7, 4) = 50(

    pRA)4

    q RA(

    pRB)3(

    q RB)3 + 5(pRA)5(pRB)4(q RB )2

    + 50(pRA)2(q RA )3 pRB(q RB )5 + 5pRA(q RA )4(q RB )6+ 100(pRA)3(q RA )2(pRB)2(q RB )4 (A.12)

    pTA(7, 5) = 30(

    pRA)2(

    q RA)4

    pRB(q RB

    )5 + pRA(q RA )5(q RB )6+ 200(pRA)4(q RA )2(pRB)3(q RB )3 + 75(pRA)5q RA (pRB)4(q RB )2+ 150(pRA)3(q RA )3(pRB)2(q RB )4 + 6(pRA)6(pRB)5q RB . (A.13)

    To obtain pTA ( j , i) from pTA (i , j), we interchange p

    RA qRA and pRB qRB in

    (A.9)(A.13). Finally, pTA (6, 6) in (13) is given by

    pTA(6, 6) = 1 [

    5i=0

    (pTA(i, 7) + pTA(7, i)

    )]. (A.14)

  • Probability of Winning at Tennis 267

    The nonzero coefficients aSij(n) = bSij(n) are given byaS20(1) = 6, aS21(1) = 24, aS22(1) = 36, aS23(1) = 24, aS24(1) = 6,aS30(1) = 9, aS31(1) = 51, aS32(1) = 99, aS33(1) = 81, aS34(1) = 24,aS40(1) = 3, aS41(1) = 24, aS42(1) = 60, aS43(1) = 60, aS44(1) = 21.

    (A.15)

    aS10(2) = 5, aS11(2) = 25, aS12(2) = 50, aS13(2) = 50,aS14(2) = 25, aS15(2) = 5,aS20(2) = 16, aS21(2) = 124, aS22(2) = 336, aS23(2) = 424,aS24(2) = 256, aS25(2) = 60aS30(2) = 18, aS31(2) = 198, aS32(2) = 696, aS33(2) = 1080,aS34(2) = 774, aS35(2) = 210aS40(2) = 8, aS41(2) = 124, aS42(2) = 560, aS43(2) = 1060,aS44(2) = 896, aS45(2) = 280aS50(2) = 1, aS51(2) = 25, aS52(2) = 150, aS53(2) = 350,aS54(2) = 350, aS55(2) = 126.

    (A.16)

    The nonzero coefficients aTij (n) = bTij (n) are given byaT30(1) = 4, aT31(1) = 16, aT32(1) = 24, aT33(1) = 16, aT34(1) = 4,aT40(1) = 3, aT41(1) = 16, aT42(1) = 30, aT43(1) = 24, aT44(1) = 7.

    (A.17)

    aT20(2) = 10, aT21(2) = 50, aT22(2) = 100, aT23(2) = 100,aT24(2) = 50, aT25(2) = 10aT30(2) = 24, aT31(2) = 166, aT32(2) = 424, aT33(2) = 516,aT34(2) = 304, aT35(2) = 70aT40(2) = 18, aT41(2) = 166, aT42(2) = 530, aT43(2) = 774,aT44(2) = 532, aT45(2) = 140aT50(2) = 4, aT51(2) = 50, aT52(2) = 200, aT53(2) = 350,aT54(2) = 280, aT55(2) = 84.

    (A.18)

  • 268 P. K. Newton and J. B. Keller

    aT10(3) = 6, aT11(3) = 36, aT12(3) = 90, aT13(3) = 120,aT14(3) = 90, aT15(3) = 36, aT16(3) = 6aT20(3) = 25, aT21(3) = 230, aT22(3) = 775, aT23(3) = 1300,aT24(3) = 1175, aT25(3) = 550, aT26(3) = 105aT30(3) = 40, aT31(3) = 510, aT32(3) = 2200, aT33(3) = 4500,aT34(3) = 4800, aT35(3) = 2590, aT36(3) = 560aT40(3) = 30, aT41(3) = 510, aT42(3) = 2750, aT43(3) = 6750,aT44(3) = 8400, aT45(3) = 5180, aT46(3) = 1260aT50(3) = 10, aT51(3) = 230, aT52(3) = 1550, aT53(3) = 4550,aT54(3) = 6580, aT55(3) = 5620, aT56(3) = 1260,aT60(3) = 1, aT61(3) = 36, aT62(3) = 315, aT63(3) = 1120,aT64(3) = 1890, aT65(3) = 1512, aT66(3) = 462.

    (A.19)

    References

    1. S. R. CLARKE and D. S. DYTE, Using official tennis ratings to estimate tournamentchances, preprint, 2002.

    2. R. T. STEFANI, Survey of the major world sports rating systems, J. Appl. Stat.24(6):635646 (1997).

    3. J. B. KELLER, Probability of a shutout in racquetball, SIAM Rev. 26:267268 (1984).

    4. J. RENICK, Optimal strategies at decision points in singles squash, Res. Quart. ExerciseSport 47:562568 (1976).

    5. J. B. KELLER, Tie point strategies in badminton, preprint, 2003.

    6. B. P. HSI and D. M. BURYCH, Games of two players, Appl. Stat.: J. R. Stat. Soc. C22(1):8692 (1971).

    7. W. H. CARTER and S. L. CREWS, An analysis of the game of tennis, Am. Stat.28(4):130134 (1974).

    8. G. H. POLLARD, An analysis of classical and tie-breaker tennis, Austr. J. Stat. 25:496505(1983).

    9. F. J. G. M. KLAASSEN and J. R. MAGNUS, Are points in tennis independent and identicallydistributed? Evidence from a dynamic binary panel data model, J. Am. Stat. Assoc.96(454):500509 (2001).

    10. S. L. GEORGE, Optimal strategy in tennis: A simple probabilistic model, Appl. Stat.: J. R.Stat. Soc. C 22(1):97104 (1973).

    11. R. E. MILES, Symmetric sequential analysis: The efficiencies of sports scoring systems(with particular reference to those of tennis), J. R. Stat. Soc. B 46(1):93108 (1984).

  • Probability of Winning at Tennis 269

    12. C. MORRIS, The most important points in tennis, in Optimal Strategies in Sport (S. P.Ladany and R. E. Machol, Eds.), pp. 131140, Amsterdam; North-Holland, 1977.

    13. J. R. MAGNUS and F. J. G. M. KLAASSEN, The effect of new balls in tennis: Four years atWimbledon, The Statistician 48:239246 (1999).

    14. F. J. G. M. KLAASSEN and J. R. MAGNUS, How to reduce the service dominance intennis? Empirical results from four years at Wimbledon, preprint, 2003.

    15. J. R. MAGNUS and F. J. G. M. KLAASSEN, The final set in a tennis match: Four years atWimbledon, J. Appl. Stat. 26(4):461468 (1999).

    16. J. G. KINGSTON, Comparison of scoring systems in two-sided competitions, J. Comb.Theory A 20:357362 (1976).

    17. C. L. ANDERSON, Note on the advantage of first serve, J. Comb. Theory A 23:363 (1977).

    18. P. K. NEWTON and G. H. POLLARD, Service neutral scoring strategies for tennis, inProceedings of the Seventh Autralasian Conference on Mathematics and Computers inSport, 2004.

    19. F. J. G. M. KLAASSEN and J. R. MAGNUS, Forecasting in tennis, preprint, 2003.

    20. J. R. MAGNUS and F. J. G. M. KLAASSEN, On the advantage of serving first in a tennisset: Four years at Wimbledon, The Statistician 48:247256 (1999).

    21. D. JACKSON and K. MOSURSKI, Heavy defeats in tennis: Psychological momentum orrandom effects, Chance 10:2734 (1997).

    UNIVERSITY OF SOUTHERN CALIFORNIASTANFORD UNIVERSITY

    (Received July 21, 2004)