Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Alberto Palacios Pawlovsky SPM
1 IntroductionSoccer is one of the most popular sports in the world and with baseball is one of thetwo most popular sports in Japan. In soccer, two teams, each one of eleven players,try to put a ball into the adversary goal that is defended by a goalkeeper who is theonly player that can touch the ball with the hands within a restricted zone of thegame area. The other players can only kick or head the ball inside all the gamearea. In almost all the tournaments of soccer a team is awarded three points if itwins, zero points if it is defeated, and each of the contenders is given one point ifthe match ends in a tie in the score (Brillinger (2010)).
The distribution of goals in soccer has been the focus of research and Reep, Pol-lard, and Benjamin (1971) showed that the number of goals scored by a team wouldfollow a Negative Binomial distribution. Maher (1982) contended this finding andinstead used a Poisson distribution and defined the mean of the goals scored by ateam as the product of its attack strength and defense weakness. His model can alsobe used to predict scores. For game outcome prediction, we can also use ranking.
Ranking systems are used in some sports to select or seed teams for pre or postseason tournaments (Harville (2003)). Ranking teams before a match has also valueto managers because it could help them in choosing defense and offense strategiesor the starting players for a game. The problem of rating teams and forming aranking has been studied for a long time and people from diverse disciplines andbackgrounds have proposed several methods.
The creation and open availability of databases for almost any popular sport hasalso fostered the development of many computer-based ranking systems. One thathas attracted attention is the method proposed by Colley (2002) since it is one ofthe computer rankings used for the College football’s Bowl Championship Series(BCS). His ranking system can also be used for prediction. Ingram (2007) workusing linear algebra is the base of the ODM (Offense Defense Model) of Govan,Langville, and Meyer (2009). ODM is a model based on defense and offense rat-ings that can be used for prediction too. Stefani (2008) developed a least squaresapproach to predict the scores in rugby and soccer games using also, like Maher,defense and offense ratings of the teams.
Ranking is in itself a constant evolving area. We have models in the area ofinformation processing, like the one of Callagham, Mucha, and Porter (2007), thatcould also be applied to sports. For soccer, we have the work of Hallinan (2005)that ranks national soccer teams using a modified Bradley-Terry Model (Bradleyand Terry (1952)). We even have works that use computer voters to determinerating and rank (Gleich and Lim (2011)).
One characteristic almost common to all computer based methods is that theyuse the information currently available in sports association sites. This paper intro-
1
Alberto Palacios Pawlovsky SPM
duces two metrics we have developed to rate soccer teams. We evaluated the qualityof these metrics rating and ranking Japanese university soccer teams and using theresulting rank for predicting the outcome of soccer games in the first and seconddivision of JUFA (Japanese University Football Association).
2 Scores and Points Metrics (SPM)In the case of soccer the usual and minimal information available in associationsites, for tournament games, is the date of the matches and the corresponding scores.We have been studying several metrics based on this basic information to use themin rating teams, rank them and predict the outcome of future games. We proposetwo performance metrics and one way to rate a team. One of the metrics uses thegoals scored by a team and the other one the points that the team gets up to a givenmatch, so we will call them scores and points metrics (SPM) in what follows.
2.1 SPM and Rating
We will explain our metrics using two teams, i and a that has confronted each otherin the k-th match day. We will express the points gained by team i before this gameby pk−1
i and those of the a (adversary) team by pk−1a . Their initial values, before
the start of the season, will be p0i = 0 and p0
a = 0.We will use the point rules of soccer tournaments, so if the i team wins the k-
th match, it will earn three points, and its points will be given by pki = pk−1
i + 3.However, if the game ends in a tie, the team points will be given by pk
i = pk−1i +1,
and if it loses by pki = pk−1
i . The total number of points possible to be earned up tothe k-th match, for any team, is given by equation (1).
pktotal = k×3 (1)
One of our performance metrics evaluates the points gained by a team relative to allthe points that it could have earned. We call it the points metric and is defined bythe following equation.
pi,k =pk
i
pktotal
(2)
We also measure the performance of a team by its goals. The goals of the i team inthe j-th game are expressed by gi, j and all its goals up to the k-th match are givenby equation (3).
gki =
k
∑j=1
gi, j (3)
2
Alberto Palacios Pawlovsky SPM
In the same way we express the goals conceded by a team in the j-th game by cgi, jand its total number up to the k-th game by equation (4).
cgki =
k
∑j=1
cgi, j (4)
So the total number of goals scored and conceded by team i is given by the followingequation. If tgk
i is 0, it is set to 0.1 to avoid the division by zero in some specialcases.
tgki = gk
i + cgki (5)
Our second metric measures the performance of team i, up to the k-th game, usingwhat we call the scores metric (s) of a team, which is defined by equation (6).
si,k =gk
i
tgki
(6)
We use the above two metrics (equations (2) and (6)) to rate a team, up to the k-thgame, according to the formula given by equation (7).
rki = si,k× pi,k (7)
We use the rating of a team to compare it to other teams and if needed rank them.We have also studied and evaluated the individual effects of the scores and
points metrics when rating. In the case of using only the score metric, equation(7) becomes,
rki = si,k (8)
And when using only the points metric the rating is given by,
rki = pi,k (9)
We evaluated our metrics combining them, as in equation (7), but using a weight-ing factor (w) to measure their effect on rating. For the evaluation, we used thefollowing (modified) rating.
rki = (si,k)
(1−w)× (pi,k)w (10)
When w is 0 we have equation (8) and when it is 1 we get equation (9). Theevaluation was carried setting w to 11 values, from 0 to 1 in increments of 0.1and using the ratings for game outcome prediction. The results are detailed in thefollowing subsections
3
Alberto Palacios Pawlovsky SPM
2.2 Weighted SPM Evaluation : Prediction
As indicated above, we used in the evaluation of our metrics the data of the lasttwelve years (1999∼2010) of the first and second divisions of the Japanese Uni-versity Football Association (JUFA, Kanto League). It has the characteristic thatalmost all its games are played in neutral stadiums, with none or negligible homeadvantage. We used the public data in the site of JUFA (2011).
The rules governing JUFA have changed over the years, and the data collectedhave the following characteristics. The first and second divisions of JUFA had only8 teams in 1999 and 2000, and the teams played only one game against all otherteams in those seasons. From 2001 to 2004, the teams played also a return game andthe season’s games were divided into two terms. Since the 2005 season, the numberof teams per division grew to 12 teams. In all these seasons, all the teams playedthe same number of games before confronting an adversary. The only exception inthe data is season 2006, second division. In this season, one team was suspended inthe middle of the second term and that year we had only 119 games in that division.
The data gathering process required the parsing and processing of all the cor-responding match day pages. We used for it tailored programs written in Python.
Table 1: Prediction Results: Detailed Example (by match date, w = 0.5)
1999 Season : 1st Divisionmatch w = 0.5
League 1 20/4
0.00%
League 1 32/4
50.00%
League 1 41/4
25.00%
League 1 51/4
25.00%
League 1 63/4
75.00%
League 1 72/4
50.00%
Total:9/24
37.50%
JUFA’s games are all scheduled weekly, so for all the weights (w) the scoresand points metrics were computed using weekly results. We predicted only theresults of the games after the first match day (from the second game onward). One
4
Alberto Palacios Pawlovsky SPM
sample of the detailed results, for one season, is shown in Table 1. Match days arerepresented in a League x y format. Where x is the division and y is the match day.
We used equation (10) to rate each team before its k-th match using data up toits previous (k-1)-th match. We then used those ratings to determine the winners ofthe k-th match day. We have not used pre-season data neither other data to improvethe predictions.
The total results for the first division and all weights are shown in Figure 1. Itshows that the best setting is w = 0.5 for the twelve years span (an equal weight forthe s and p metrics). However, if we see the details of Table 2, for the 1999 and 2000seasons, all the weights between 0.1 and 0.6 will give the same highest predictionpercentage. Also, for the seasons between 2001 and 2004, the best weight is 0.9.Moreover, for the contemporary data (2005 onward) the best figures are obtained
Figure 1: Weighted SPM : Foresight Prediction Percentages (1st Division JUFA).
with w set to 0.0 (only using the scores metric) or 0.1.Figure 2 shows the results for the second division. Those results show a slight
different distribution. The best setting is for w = 0.6 and second is w = 0.5. Takingthe details of Table 3, we can determine that for the 1999 and 2000 seasons, thebest prediction percentages are obtained with the values of w between 0.0 and 0.6.Also, for the seasons between 2001 and 2004 the highest prediction percentagesare obtained with w set to 0.6 or 1.0 (second best values are for 0.5 and 0.7). Andif we limit the spam to the seasons after 2005, the best values are obtained with
5
Alberto Palacios Pawlovsky SPM
w set to 0.3, 0.5 and 0.6. The best weight values for both divisions hint that thebest combination of the score and points metrics is with an equal weight for bothmetrics.
Figure 2: Weighted SPM : Foresight Prediction Percentages (2nd Division JUFA).
Table 2 shows the details per season of the prediction results for the first divisionof JUFA. As it has already been shown in Figure 1, for all the twelve seasons, thebest total is obtained with w = 0.5. If we look at the best values per season (inboldface) of this table, the setting with the highest number of seasons with bestvalues is w = 1. It has five seasons with best values (1999, 2001, 2003, 2005,and 2009), with two of them on the contemporary range. The next best settingis w = 0.7, with three seasons with best values and all them in the contemporaryrange. Three values of w share the third position, 0, 0.5 and 0.6. Each one has threebest value seasons with two in the contemporary range.
Table 3 shows the details per season of the prediction results for the seconddivision of JUFA. The best values per season are also highlighted in this table. Forthe span of twelve seasons, the best setting is w = 0.6. It has six seasons with bestvalues of which three of them are in the contemporary range (2005 onward). Thenext best setting for w is 0.7 with five seasons with best values and two of themin contemporary seasons. The third place is for w = 0.3 with four seasons of bestvalues and three in the contemporary seasons. The values for w at the four placeare 0.5, 0.8 and 0.9, each having four seasons of best values with two of them from
6
Alberto Palacios Pawlovsky SPM
Tabl
e2:
Pred
ictio
nR
esul
ts(1
stD
ivis
ion
JUFA
):co
rrec
tlypr
edic
ted
gam
es/a
llga
mes
,and
corr
espo
ndin
gpe
rcen
tage
.
Yea
rw
eigh
t(w
)0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1999
8/24
8/24
8/24
8/24
9/24
9/24
9/24
7/24
7/24
7/24
9/24
33.3
333
.33
33.3
333
.33
37.5
037
.50
37.5
029
.17
29.1
729
.17
37.5
0
2000
12/2
413
/24
13/2
413
/24
12/2
412
/24
12/2
412
/24
12/2
412
/24
11/2
450
.00
54.1
754
.17
54.1
750
.00
50.0
050
.00
50.0
050
.00
50.0
045
.83
2001
19/5
219
/52
20/5
219
/52
19/5
220
/52
19/5
219
/52
19/5
220
/52
21/5
236
.54
36.5
438
.46
36.5
436
.54
38.4
636
.54
36.5
436
.54
38.4
640
.38
2002
24/5
224
/52
24/5
225
/52
24/5
223
/52
23/5
223
/52
23/5
223
/52
22/5
246
.15
46.1
546
.15
48.0
846
.15
44.2
344
.23
44.2
344
.23
44.2
342
.31
2003
23/5
222
/52
23/5
224
/52
26/5
226
/52
26/5
227
/52
28/5
228
/52
28/5
244
.23
42.3
144
.23
46.1
550
.00
50.0
050
.00
51.9
253
.85
53.8
553
.85
2004
30/5
230
/52
30/5
229
/52
30/5
229
/52
29/5
229
/52
29/5
229
/52
27/5
257
.69
57.6
957
.69
55.7
757
.69
55.7
755
.77
55.7
755
.77
55.7
751
.92
2005
56/1
2655
/126
56/1
2655
/126
56/1
2656
/126
57/1
2657
/126
57/1
2656
/126
57/1
2644
.44
43.6
544
.44
43.6
544
.44
44.4
445
.24
45.2
445
.24
44.4
445
.24
2006
69/1
2667
/126
64/1
2664
/126
63/1
2665
/126
64/1
2664
/126
64/1
2665
/126
66/1
2654
.76
53.1
750
.79
50.7
950
.00
51.5
950
.79
50.7
950
.79
51.5
952
.38
2007
66/1
2665
/126
65/1
2664
/126
64/1
2663
/126
62/1
2661
/126
60/1
2660
/126
57/1
2652
.38
51.5
951
.59
50.7
950
.79
50.0
049
.21
48.4
147
.62
47.6
245
.24
2008
59/1
2662
/126
61/1
2660
/126
60/1
2661
/126
60/1
2659
/126
58/1
2658
/126
53/1
2646
.83
49.2
148
.41
47.6
247
.62
48.4
147
.62
46.8
346
.03
46.0
342
.06
2009
62/1
2662
/126
62/1
2662
/126
63/1
2663
/126
63/1
2663
/126
62/1
2662
/126
63/1
2649
.21
49.2
149
.21
49.2
150
.00
50.0
050
.00
50.0
049
.21
49.2
150
.00
2010
63/1
2664
/126
64/1
2664
/126
64/1
2665
/126
64/1
2665
/126
64/1
2664
/126
63/1
2650
.00
50.7
950
.79
50.7
950
.79
51.5
950
.79
51.5
950
.79
50.7
950
.00
Tota
l49
1/10
1249
1/10
1249
0/10
1248
7/10
1249
0/10
1249
2/10
1248
8/10
1248
6/10
1248
3/10
1248
4/10
1247
7/10
1248
.52
48.5
248
.42
48.1
248
.42
48.6
148
.22
48.0
247
.73
47.8
347
.13
7
Alberto Palacios Pawlovsky SPM
Table3:Prediction
Results
(2ndD
ivisionJU
FA):correctly
predictedgam
es/allgames,and
correspondingpercentage.
Year
weight(w
)0
0.10.2
0.30.4
0.50.6
0.70.8
0.91
199912/24
13/2413/24
13/2413/24
13/2413/24
13/2413/24
13/2412/24
50.0054.17
54.1754.17
54.1754.17
54.1754.17
54.1754.17
50.00
200013/24
12/2412/24
12/2412/24
12/2412/24
11/2411/24
11/2410/24
54.1750.00
50.0050.00
50.0050.00
50.0045.83
45.8345.83
41.67
200120/52
21/5221/52
20/5220/52
20/5220/52
19/5220/52
20/5222/52
38.4640.38
40.3838.46
38.4638.46
38.4636.54
38.4638.46
42.31
200227/52
28/5228/52
28/5228/52
29/5229/52
29/5227/52
27/5225/52
51.9253.85
53.8553.85
53.8555.77
55.7755.77
51.9251.92
48.08
200321/52
22/5222/52
21/5222/52
22/5222/52
22/5222/52
22/5224/52
40.3842.31
42.3140.38
42.3142.31
42.3142.31
42.3142.31
46.15
200421/52
20/5220/52
21/5222/52
22/5223/52
23/5223/52
23/5223/52
40.3838.46
38.4640.38
42.3142.31
44.2344.23
44.2344.23
44.23
200563/126
64/12664/126
64/12664/126
64/12664/126
63/12663/126
63/12658/126
50.0050.79
50.7950.79
50.7950.79
50.7950.00
50.0050.00
46.03
200661/119
61/11960/119
59/11959/119
60/11959/119
59/11959/119
59/11961/119
51.2651.26
50.4249.58
49.5850.42
49.5849.58
49.5849.58
51.26
200773/126
71/12671/126
70/12671/126
70/12670/126
70/12669/126
69/12668/126
57.9456.35
56.3555.56
56.3555.56
55.5655.56
54.7654.76
53.97
200870/126
70/12671/126
74/12672/126
71/12671/126
71/12671/126
72/12673/126
55.5655.56
56.3558.73
57.1456.35
56.3556.35
56.3557.14
57.94
200976/126
78/12678/126
79/12678/126
81/12682/126
82/12682/126
82/12678/126
60.3261.90
61.9062.70
61.9064.29
65.0865.08
65.0865.08
61.90
201058/126
58/12659/126
59/12659/126
59/12659/126
59/12659/126
59/12658/126
46.0346.03
46.8346.83
46.8346.83
46.8346.83
46.8346.83
46.03
Total515/1005
518/1005519/1005
520/1005520/1005
523/1005524/1005
521/1005519/1005
520/1005512/1005
51.2451.54
51.6451.74
51.7452.04
52.1451.84
51.6451.74
50.95
8
Alberto Palacios Pawlovsky SPM
2005 onward. If we limit the analysis to the contemporary data, the best w’s settingsare 0.3 and 0.6. Second come all other settings, but w = 1.
Based on all the above and taking as reference the number of games correctlypredicted, for contemporary data, in both divisions we chose as best setting w =0.5. For this value of w we have in the first division 373 games (of 756) correctlypredicted. The highest number is obtained with w = 0 or w = 0.1, but the differenceis only of two games. For the second division, the number of games correctlypredicted with this setting is 405 games (of 749). The same number is obtainedwith w = 0.6, but there is a difference of three games in the first division (w = 0.6has only 370 games correctly predicted).
Setting w to 0.5 means that the scores and points metrics must have the sameweight. In other words, if we will combine them for rating teams and use thisrating for ranking or prediction we should use equation (7). It also seems to be thebest tradeoff for the whole span of data and for both divisions. Of course, anotherpossible choice would be to use different weight settings for each division.
2.3 Weighted SPM Evaluation : Fitting
We also measured the fitting of the predictions obtained with our weighted met-rics. Some authors call it hindsight prediction. It is a way of measuring how well
Figure 3: Weighted SPM : Hindsight Prediction Percentages (1st Division JUFA).
9
Alberto Palacios Pawlovsky SPM
the models used fitted the target data (the higher the hindsight prediction rate thesmaller the error).
For hindsight prediction, we could use the data (rating, ranking) at the end ofthe season to predict the outcomes of all its games, but we opted for measuring it,incrementally, one match day at a time. We used data up to game k-th to predict theresults of that match day. The total results for the first division of JUFA are shownin Fig. 3. The total results for all weights and for the second division of JUFA areshown in Fig. 4. As expected, and since the team with more points is the top team at
Figure 4: Weighted SPM : Foresight Prediction Percentages (2nd Division JUFA).
the end of a season, or a match day, the best fitting is obtained with big weights forthe points metric. However, we can also see from both figures that the best weightis not equal to 1. For the first division, and the twelve seasons span, the best fittingis obtained with w set to 0.9, and for the second division this value is 0.8.
The best hindsight percentages for the contemporary data (2005 season onward)are also obtained with these (best) weight settings.
3 SPM Evaluation : Comparison to Other MethodsWe also evaluated the performance of the ratings based on our metrics comparingtheir prediction results to those we can derive using the methodology of Maher
10
Alberto Palacios Pawlovsky SPM
(1982), the score prediction method of Stefani (2008), the ranking method of Colley(2002), and the ODM of Govan et al. (2009).
Maher supported the theory that the number of goals in soccer follows a Pois-son distribution, and defined parameters to represent the defensive and offensivecharacteristics of a team. His approach defines four parameters for each team (twoat home and two when playing away). However, he studied the importance of allthese four values and found that two of them, the offensive strength and defensiveweakness, will suffice to describe the quality of a team (without differencing themfor home and away games). Using his method we can calculate the mean of thegoals distribution of each team and determine the number of goals most likely to bescored in a given game. Once we know the scores, we can predict the result of agame between any two teams.
Colley has proposed a method for ranking college (American) football teamsthat uses only the number of games won and the number of games played as input.In this method, we can calculate the ratings by an iterative scheme or a matrix oflinear equations. The ratings for a given game can then be used to rank teams andpredict the winner of a game. In our implementation of this method we used asinitial rating, for all teams, a value of 0.5.
Stefani developed a least squares and an exponential smoothing method for pre-dicting scores and applied it to English Premier Soccer League and Super 12/14rugby union competitions. His model predicts the scores of the home team andaway team using the offensive and defensive ratings of each team. In his method,these ratings have a smooth factor that puts more weight in more recent game re-sults. In our implementation of this method, we used 0.5 as initial value of theoffensive and defensive ratings for all teams.
The Offense-Defense Model (ODM) of Govan et al. (2009) uses a matrix todefine the offensive and defensive ratings of a team. Our implementation followsthe details given in that paper.
3.1 Foresight Prediction
We used our metrics and the rating based on them for game outcome foresightprediction. All the predictions, for all the methods, are based on the data availablebefore the game we will predict. All the methods used the same information. Thereis no home advantage in the games so no method uses it.
Maher’s and Stefani’s methods predict the scores while all other methods, in-cluding ours, compute ratings to compare teams and decide which one will win. Inthe case the ratings of both teams are equal, the game is predicted as a tie.
11
Alberto Palacios Pawlovsky SPM
Table 4 and Table 5 show the prediction results for JUFA’s first and seconddivisions, respectively. From these tables, we can see that SPM gives the best per-centages for both divisions for the whole twelve seasons.
Table 4: Foresight Prediction Results Comparison: JUFA’s 1st Division.
First Division MethodSeason Maher Stefani ODM Colley SPM
19993/24 5/24 11/24 9/24 9/24
12.50% 20.83% 45.83% 37.50% 37.50%
20008/24 8/24 12/24 11/24 12/24
33.33% 33.33% 50.00% 45.83% 50.00%
200121/52 23/52 17/52 19/52 20/52
40.38% 44.23% 32.69% 36.53% 38.46%
200223/52 21/52 22/52 23/52 23/52
44.23% 40.38% 42.30% 44.23% 44.23%
200318/52 24/52 24/52 28/52 26/52
34.61% 46.15% 46.15% 53.84% 50.00%
200426/52 24/52 28/52 29/52 29/52
50.00% 46.15% 53.84% 55.76% 55.76%
200556/126 49/126 57/126 57/126 56/12644.44% 38.88% 45.23% 45.23% 44.44%
200660/126 53/126 67/126 62/126 65/12647.61% 42.06% 53.17% 49.20% 51.58%
200743/126 47/126 63/126 63/126 63/12634.12% 37.30% 50.00% 50.00% 50.00%
200847/126 48/126 63/126 52/126 61/12637.30% 38.09% 50.00% 41.26% 48.41%
200947/126 52/126 63/126 61/126 63/12637.30% 41.26% 50.00% 48.41% 50.00%
201050/126 52/126 62/126 61/126 65/12639.68% 41.26% 49.20% 48.41% 51.58%
Total402/1012 406/1012 489/1012 475/1012 492/101239.72% 40.11% 48.32% 46.93% 48.61%
However, if we look only at the first division table, ODM’s method has sevenseasons of best values, with five of them in the contemporary data range. It isfollowed by SPM’s method, with six seasons with best values and three of them inthe contemporary seasons.
For the second division, Colley’s method has six seasons with best values, withfour in the contemporary years (2005 onward). The second place corresponds to
12
Alberto Palacios Pawlovsky SPM
SPM’s method with four seasons with best values, and three of them in contempo-rary seasons.
Table 5: Foresight Prediction Results Comparison : JUFA’s 2nd Division.
Second Division MethodSeason Maher Stefani ODM Colley SPM
19998/24 14/24 11/24 12/24 13/24
33.33% 58.33% 45.83% 50.00% 54.16%
20009/24 13/24 10/24 9/24 12/24
37.50% 54.16% 41.66% 37.50% 50.00%
200118/52 9/52 21/52 23/52 20/52
34.61% 17.30% 40.38% 44.23% 38.46%
200223/52 23/52 28/52 25/52 29/52
44.23% 44.23% 53.84% 48.07% 55.76%
200316/52 14/52 21/52 23/52 22/52
30.76% 26.92% 40.38% 44.23% 42.30%
200413/52 20/52 24/52 19/52 22/52
25.00% 38.46% 46.15% 36.53% 42.30%
200554/126 52/126 65/126 65/126 64/12642.85% 41.26% 51.58% 51.58% 50.79%
200645/119 56/119 58/119 59/119 60/11937.81% 47.05% 48.73% 49.57% 50.42%
200755/126 55/126 67/126 71/126 70/12643.65% 43.65% 53.17% 56.34% 55.55%
200859/126 58/126 71/126 73/126 71/12646.82% 46.03% 56.34% 57.93% 56.34%
200954/126 67/126 73/126 77/126 81/12642.85% 53.17% 57.93% 61.11% 64.28%
201046/126 49/126 53/126 60/126 59/12636.50% 38.88% 42.06% 47.61% 46.82%
Total400/1005 430/1005 502/1005 516/1005 523/100539.80% 42.78% 49.95% 51.34% 52.03%
From Figure 5 and Figure 6, we can see that no method reach the 60% line forthe first division and the 70% line for the second division.
For the first division and seasons 1999 and 2000 where the number of games issmall, Maher’s and Stefani’s did not give good results. From 2001 to 2004, wherethe number of games doubled, these methods improved their predictions but forcontemporary data (2005 onward) they hardly reached the 40% line.
13
Alberto Palacios Pawlovsky SPM
ODM’s, Colley’s and the prediction based on SPM show the best results foralmost all these seasons. With only one exception in 2001, where Stefani’s methodshows the highest prediction percentage.
Figure 5: Foresight Prediction Results Comparison (1st Division JUFA).
For the second division of JUFA and the seasons of 1999 and 2000, Stefani’sbased predictions show high figures, but its results for all other seasons are low.Maher’s results started with low values, but they seem to be more stable for seasonswith a larger number of games. Its results for contemporary data show an averagearound the 40% line for all those seasons.
In the first division and for all seasons between 2000 and 2004, the predictionsbased on SPM are better than those given by ODM’s method. For contemporarydata, both methods alternate in giving the best results, but without a clear differencebetween them. For the second division and years 2001, 2004 and 2005, ODM’smethod gives better results, but for all other seasons SPM’s method is better.
If we compare only Colley’s and SPM’s results for the first division, Colley’smethod gives the best results for seasons 2003 and 2005, but SPM’s results arebetter for all other seasons (Fig. 5). When comparing the results of the seconddivision, we can not see a clear predominance of one of these methods. SPM givesbetter results for seasons with a small number of games (1999, and 2000), but bothmethods alternate in giving the best results for almost all other seasons (Fig. 6).
14
Alberto Palacios Pawlovsky SPM
Figure 6: Foresight Prediction Results Comparison (2nd Division JUFA).
3.2 Fitting : Hindsight Prediction
We also measured the fitting of all the methods we compared. The results obtainedare shown in Figure 7 and Figure 8. They are also detailed in Table 6 and Table 7,respectively.
For the first division, Maher’s based method gives, almost for all seasons, thelowest results with one exception in the 2004 season where it has one of the highestvalues. For the same data, Stefani’s based method is almost stable and its resultsmove around an average of 50% for all the contemporary seasons. These two meth-ods show almost the same total prediction percentage. This time again, ODM’s,Colley’s and SPM based methods stand above these methods, with a little total dif-ference between any two of them. ODM’ results keeps a position around the 55%line, while Colley’s and SPM’s results move around the 60% line. Their hindsightprediction percentages are the best for the first division of JUFA (Fig. 7). If we seethe total figures of Table 6, these methods have almost a 10% of difference whencompared to Maher’s or Stefani’s results. For the first division, Colley’s method hasnine seasons with the best percentages, five of them in the contemporary data (2005onward). It is followed by SPM’s method which has five seasons of best values,with two of them in the contemporary range.
15
Alberto Palacios Pawlovsky SPM
Figure 7: Hindsight Prediction Results Comparison (1st Division JUFA).
Figure 8: Hindsight Prediction Results Comparison (2nd Division JUFA).
16
Alberto Palacios Pawlovsky SPM
Table 6: Hindsight Prediction Results Comparison : JUFA’s 1st Division.
First Division MethodSeason Maher Stefani ODM Colley SPM
19996/24 12/24 17/24 16/24 17/24
25.00% 50.00% 70.83% 66.66% 70.83%
200015/24 15/24 15/24 17/24 17/24
62.50% 62.50% 62.50% 70.83% 70.83%
200131/52 32/52 27/52 27/52 28/52
59.61% 61.53% 51.92% 51.92% 53.84%
200228/52 26/52 28/52 31/52 30/52
53.84% 50.00% 53.84% 59.61% 57.69%
200321/52 27/52 30/52 34/52 34/52
40.38% 51.92% 57.69% 65.38% 65.38%
200438/52 33/52 35/52 39/52 38/52
73.07% 63.46% 67.30% 75.00% 73.07%
200561/126 54/126 65/126 71/126 68/12648.41% 42.85% 51.58% 56.34% 53.96%
200673/126 68/126 77/126 84/126 81/12657.93% 53.96% 61.11% 66.66% 64.28%
200754/126 60/126 73/126 81/126 81/12642.85% 47.61% 57.93% 64.28% 64.28%
200858/126 60/126 70/126 79/126 77/12646.03% 47.61% 55.55% 62.69% 61.11%
200955/126 66/126 75/126 76/126 78/12643.65% 52.38% 59.52% 60.31% 61.90%
201061/126 61/126 77/126 84/126 82/12648.41% 48.41% 61.11% 66.66% 65.07%
Total501/1012 514/1012 589/1012 639/1012 631/101249.50% 50.79% 58.20% 63.14% 62.35%
For the second division (Fig. 8, Table 7), Maher’s and Stefani’s methods alter-nate, for almost all seasons, for the lowest results. For this division, these methodsgive total results almost in the same range. Above the 60% line are, again, ODM’s,Colley’s and SPM’s results. They are better than Maher’s and Stefani’s results foralmost a 10% of difference.
For both divisions, Colley’s and SPM’s methods give very close results. Oneseason to be noted is second division’s 2000 season. In this season, Stefani’s methodshows the best result with more than 10% of difference to any other method. Col-ley’s method also shows a similar value for the 2002 season.
17
Alberto Palacios Pawlovsky SPM
Table 7: Hindsight Prediction Results Comparison : JUFA’s 2nd Division.
Second Division MethodSeason Maher Stefani ODM Colley SPM
199913/24 17/24 16/24 17/24 17/24
54.16% 70.83% 66.66% 70.83% 70.83%
200015/24 18/24 14/24 14/24 15/24
62.50% 75.00% 58.33% 58.33% 62.50%
200125/52 21/52 35/52 32/52 31/52
48.07% 40.38% 67.30% 61.53% 59.61%
200233/52 31/52 35/52 39/52 38/52
63.46% 59.61% 67.30% 75.00% 73.07%
200324/52 24/52 30/52 29/52 30/52
46.15% 46.15% 57.69% 55.76% 57.69%
200424/52 28/52 35/52 29/52 30/52
46.15% 53.84% 61.53% 55.76% 57.69%
200569/126 58/126 75/126 81/126 80/12654.76% 46.03% 59.52% 64.28% 63.49%
200654/119 62/119 70/119 80/119 73/11945.37% 52.10% 58.82% 67.22% 61.34%
200763/126 73/126 80/126 80/126 82/12650.00% 57.93% 63.49% 63.49% 65.07%
200872/126 75/126 81/126 86/126 87/12657.14% 59.52% 64.28% 68.25% 69.04%
200968/126 77/126 86/126 88/126 88/12653.96% 61.11% 68.25% 69.84% 69.84%
201054/126 58/126 65/126 69/126 63/12642.85% 46.03% 51.58% 54.76% 50.00%
Total514/1005 542/1005 622/1005 644/1005 634/100551.14% 53.93% 61.89% 64.07% 63.08%
If we compare ODM’s and SPM’s results for the first division, SPM has a clearadvantage over ODM. The fitting of the predictions of SPM for all the seasons ofthis division are better than those of ODM. The differences between their resultsfall between 2.5% and 6%. However, this is not the case for the results of thesecond division (Fig. 8). For seasons with a small number of games (1999 and2000) SPM’s results are better. For the seasons between 2001 and 2004 there isno clear difference, and for the contemporary data SPM is better for all but the lastseason (2010). For the seasons between 2005 and 2009, the difference in the resultslies between 1.5% and 4%.
18
Alberto Palacios Pawlovsky SPM
If we compare Colley’ and SPM’s results, for the first division data, there is noclear predominance of one of these methods. However, we could say that Colley’sresults are slightly better. Both methods give the same results for three seasonsand Colley’s results are better than those of SPM in six seasons (SPM ’s results arebetter than those of Colley’s only in three seasons). Again, SPM’s results seem tobe better for seasons with a small number of games (1999 and 200 seasons). Forthe contemporary data, between 2005 and 2010, they alternate in giving the bestresults.
For the second division (Table 7), Colley’s method has six seasons with bestvalues of which four are in the contemporary range. It is followed by SPM’s methodwith five seasons with best results and three of them in the contemporary seasons(2005 onward). Colley’s results are better than those of SPM’s method for three ofthe six years of contemporary data. Also, for the 2006 and the last season, Colley’sresults have a better fitting with an improvement that ranges from 5% to 6%. Forall other seasons, there is no clear predominance of one method and the differencesbetween their results are small.
From what we explained above, we can say that SPM’s, ODM’s and Colley’sbased methods have a slight advantage to all other methods we compared. Oneway of improving the overall prediction percentages, in both divisions, would be tocombine these methods using rank aggregation (Govan et al. (2009)).
4 ConclusionsWe have shown two metrics and one way of combining them for rating soccerteams. One of the metrics uses the goals scored by a team and the other the pointsearned by it. We evaluated the combined use of these metrics using a weightedrating to rank the teams and predict the results of the games of the first and seconddivision of the Japanese University Football Association (Kanto League). Based onthe results of this evaluation, we determined that our metrics should be used withthe same weight when combined for rating. This rating seems to be the only oneusing these metrics combined in this way.
We also compared the game outcome prediction results of our metrics to thoseobtained with four other methods. The comparison results show that SPM could bea good alternative to Colley’s based method or the Offense Defense Model (ODM)when ranking teams and for prediction.
Our metrics are easy to implement and can also be used in other sports. Rugby(targeted in Stefani’s method) and Basketball and Football (targeted in ODM’smethod) could probably use them without major changes. There are works, like
19
Alberto Palacios Pawlovsky SPM
the one of Pasteur (2010), that aims to improve prediction results. Similar and otherapproaches tailored to SPM could also be topics for further study.
Annex: Brief Description of Other Methods.In this paper we compare SPM predictions to the predictions we can derive usingthe methodology of Maher (1982), the score prediction method of Stefani (2008),the ranking method of Colley (2002), and the ODM method of Govan et al. (2009).We detail briefly these methods in the following subsections.
Maher (1982) Based Prediction
Maher supported the theory that the number of goals in soccer follows a Poissondistribution, and defined parameters to represent the defensive and offensive char-acteristics of a team. In his model two teams i (home team) and j (away team) faceeach other in a game that ends with a score (xi j,yi j). He also attributes these scoresto occurrences of variables Xi j and Yi j that have a Poisson distribution and meansgiven by αiβ j and γiδ j. Where αi defines the offensive strength of (local) team i, β jthe defensive weakness of (away) team j, γi the defensive weakness of team i, andδ j the offensive strength of team j. Taking the scores’ log function, the maximumlikelihood estimators (MLE) for team i are given by Equation (11) (the values of γ
and δ can be determined in the same way).
α̂i =∑ j 6=i xi j
∑ j 6=i β̂ jand β̂i =
∑i6= j xi j
∑i 6= j α̂ j(11)
Since α̂ depends on the values of β̂ and vice versa, Maher suggests as initial valuesthe following ones.
α̂i =∑ j 6=i xi j√
Sxand β̂i =
∑i6= j xi j√Sx
(12)
Where the denominator is given by Equation (13), and is the number of the goalsscored by all teams.
Sx = ∑i
∑j 6=i
xi j (13)
He studied the importance of these values and found that two of them, the offen-sive strength and defensive weakness, will suffice to describe the quality of a team(without differencing them for home and away games). Equations (8) to (10) can beapplied to each match day k based on the data up to match k-1. After determining
20
Alberto Palacios Pawlovsky SPM
the α̂ and β̂ for each team we can calculate the mean of its goals distribution anddetermine the number of goals most likely to be scored in game k. Knowing thescores we can predict the result of a game between any two teams.
Colley (2002) Based Prediction
Colley has proposed a method for ranking college football teams that uses only thenumber of games won nw and the number of games played ntot as input. He usesthe modified winning percentage shown in equation (11) as the rating of a team.
r =1+nw
2+ntot(14)
He also works with the number of wins given by equation (12) (nl is the number ofgames lost).
nw =(nw−nl)
2+
ntot
2=
(nw−nl)
2+
ntot
∑12
(15)
And modifies the second term to define an adjustement for strength of schedulebased on the rates of the opponents of team i as given by equation (13).
ne f fw =
(nw,i−nl,i)
2+
ntot,i
∑j=1
rij (16)
It gives the effective number of wins of team i. Here rij is the rating of the jth
opponent of i. In this method we can calculate the ratings by an iterative scheme ora matrix of linear equations. The ratings for a given game can then be used to rankteams and predict the winner of a game.
Stefani (2008) Based Prediction
Stefani developed a least-squares and an exponential smoothing method for predict-ing scores and applied it to English Premier Soccer League and Super 12/14 rugbyunion competitions. His model predicts the score of home team i (si j) and awayteam j (s ji) using the formulas in equations (14) and (15).
sPi j = roi + rd j (17)
sPji = ro j + rdi (18)
21
Alberto Palacios Pawlovsky SPM
Where ro and rd are the offensive and defensive ratings of each team and for i teamare given by equations (16) and (17) (j team’s ratings are calculated in a similarway).
rnoi = rn−1
oi +[m−1
nm−1](si j− (rn−1
oi + rm−1d j )) (19)
rndi = rn−1
di +[m−1
nm−1](s ji− (rm−1
o j + rn−1di )) (20)
Here n is the number of games of team i and m the number of games of team j. Thefraction in the second term of these equations is the smoothing factor. We used 0.5as initial value of ro (r0
o) and rd (r0d) for all teams to predict the scores of the first
games when using this method.
ODM Based Prediction
The Offense-Defense Model (ODM) of Govan et al. (2009) uses a matrix A = [ai j]where ai j is the score of team j against team i. It also defines two ratings. Theoffensive rating of team j is given by the following equation.
o j = a1 j(1d1
)+ ...+an j(1dn
) (21)
And the defensive rating of i is given by equation (19).
di = ai1(1o1
)+ ...+ain(1on
) (22)
For convergence they define a new matrix P = A+ εeeT , where e is a vector of allones and equal to the initial values of all ds (d(0) = e). This makes possible thecalculation of o (all the offensive ratings) as follows.
o(k) = PT 1
d(k−1)(23)
And then of all the defensive ratings d.
d(k) = P1
o(k)(24)
The overall rating of team i is given by the following equation.
ri =oi
di(25)
For prediction we used this overall rating to rank teams and determine the winnerof a game.
22
Alberto Palacios Pawlovsky SPM
ReferencesBradley, R. A. and M. E. Terry (1952): “Rank Analysis of Incomplete Block De-
signs I : The Method of Paired Comparisons,” Biometrika, 39, 324–345.Brillinger, D. R. (2010): Wiley Enciclopedia of Operations Research and Manage-
ment Science, John Wiley and Sons, Inc., chapter Soccer/World Football.Callagham, T., P. J. Mucha, and M. A. Porter (2007): “Random Walker Ranking for
NCAA Division I-A Football,” American Mathematical Monthly, 114, 761–777.Colley, W. N. (2002): “Colley’s bias free college football ranking method,” .Gleich, D. F. and L.-H. Lim (2011): “Rank Aggregation via Nuclear Norm Mini-
mization,” in Proceedings of the Conference on Knowledge Discovery and DataMining, ACM, KDD 11, 60–68.
Govan, A. Y., A. N. Langville, and C. D. Meyer (2009): “Offense-Defense Ap-proach to Ranking Team Sports,” Journal of Quantitative Analysis in Sports, 5,1–17.
Hallinan, S. E. (2005): “Paired Comparison Models for Ranking National SoccerTeams,” Technical report, Worcester Polythecnic Institute.
Harville, D. A. (2003): “The Selection or Seeding of College Basketball or FootballTeams for Postseason Competition,” Journal of the American Statistical Associ-ation, 98, 17–27.
Ingram, L. C. (2007): Ranking NCAA Sports Teams with Linear Algebra, Master’sthesis, The Graduate School of the College of Charleston.
JUFA (2011): http://www.jufa-kanto.jp/.Maher, M. J. (1982): “Modelling Association Football Scores,” Statistica Neer-
landica, 36, 109–118.Pasteur, R. D. (2010): Extending the Colley Method to Generate Predictive Foot-
ball Rankings, number 43 in Dolciani Mathematical Expositions, MathematicalAssociation of America, chapter 10, 117–129.
Reep, C., R. Pollard, and B. Benjamin (1971): “Skill and Chance in Ball Games,”Journal of the Royal Statistical Society. Series A, 134, 623–629.
Stefani, R. T. (2008): “Predicting Score Difference Versus Score Total in Rugbyand Soccer,” IMA Journal of Management Mathematics, 20, 147–158.
23