Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Probability in SportPart 2
Jocelyn MaraDiscipline of Sport and Exercise Science
Law of Total Probabilitye.g. 2013 MLB St Louis Cardinals
Severini, 2015
Frequency Percentage
Total wins 97 59.9
Total losses 65 41.1
Wins at home 54 66.7 (out of 81)
Wins away 43 53.1 (out of 81)
Total games 162 -
Law of Total Probability
P(W) = 0.599
P(W|H) = 0.667
P(W | Not H) = 0.531
Severini, 2015
Law of Total Probability
• The number of home games and away games is the same so…
• Their overall winning % is the same as the average of their home and
away winning %
0.599 = (0.667 + 0.531) / 2
Severini, 2015
Law of Total Probability
• In probability notation this is..
P(W) = P(H) P(W|H) + P(not H) P(W | not H)
0.599 = (0.5)(0.667) + (0.5)(0.531)
Severini, 2015
Law of Total Probability
• Unconditional probability (e.g. P(W)) can be expressed as a weighted
average of the conditional probabilities of A given B (e.g. P(W | H)), and A
given not B (e.g. P(W | not H))
• The weights depend on the probability of B (e.g. P(H) and P(not H))
• The weights were equal in the Cardinals example…
• … but this isn’t always the case
Severini, 2015
Law of Total Probabilitye.g. Pitchers Josh Beckett and Johan Santana in 2009
Severini, 2015
Batting Average Against (BAA) Josh Beckett Johan Santana
Overall 0.2441 0.2438
Against right handers 0.226 0.235
Against left handers 0.258 0.267
Law of Total Probabilitye.g. Pitchers Josh Beckett and Johan Santana in 2009
Severini, 2015
Batting Average Against (BAA) Josh Beckett Johan Santana
Overall 0.2441 0.2438
Against right handers 0.226 0.235
Against left handers 0.258 0.267
Law of Total Probability
• Can be explained by Beckett and Santana faced different proportions of
right and left-handed batters
Severini, 2015
Law of Total Probability
P(H) = P(R) P(H | R) + P(L) P(H | L)
WhereH = pitcher allows a hitR = batter is right handedL = batter is left handed
Severini, 2015
Law of Total Probability
e.g. Josh Beckett
0.244 = (0.432) (0.226) + (0.568) (0.258)
Severini, 2015
Proportion of pitches against right handers
Proportion of pitches against left handers
Law of Total Probability
e.g. Johan Santana
0.244 = (0.719) (0.235) + (0.281) (0.267)
Severini, 2015
Proportion of pitches against right handers
Proportion of pitches against left handers
Law of Total Probability
• Beckett was better than Santana vs both right and left handed batters
• But Santana faced more right handed batters than Beckett
• Using conditional probabilities provides more information about relative
performance than unconditional probabilities
Severini, 2015
Adjusting Sports Statistics
• What would Beckett’s overall BAA have been if he faced the same
proportion of right hand batters as Santana?
BAA = (0.432) (0.226) + (0.568) (0.258) = 0.244
Adj. BAA = (0.719) (0.226) + (0.281) (0.258) = 0.235
Severini, 2015
Adjusting Sports Statistics
• In 2009 MLB 56% of all at-bats were from right-handers
Santana’s BAA = (0.719) (0.235) + (0.281) (0.267) = 0.244
Adj. BAA = (0.56) (0.235) + (0.44) (0.267) = 0.249
Severini, 2015
Adjusting Sports Statistics
• LA Lakers 50 matches into the 2017-2018 NBA
Severini, 2015
Frequency Percentage
Total wins 23 46.0
Total losses 27 54.0
Wins at home 11 50.0 (out of 22)
Wins away 12 42.9 (out of 28)
Total games 50
Adjusting Sport Statistics
P(W) = P(H) P(W | H) + P(A) P(W | A)
0.46 = (0.44) (0.50) + (0.56) (0.429)
Adj. P(W) = (0.50) (0.50) + (0.50) (0.429)
= 0.465
Severini, 2015
Adjusting Sport Statistics
• Formally this is known as subclassification adjustment
• AKA direct adjustment
P(W) = P(H) P(W | H) + P(A) P(W | A)
Severini, 2015
These are the subclasses
Adjusting Sport Statistics
• Formally this is known as subclassification adjustment
• AKA direct adjustment
P(W) = P(H) P(W | H) + P(A) P(W | A)
Severini, 2015
These are the subclass weights
Adjusting Sport Statistics
• Subclassification adjustment is not just restricted to probabilities
• Can be used whenever the measurement of interest can be calculated
when applying weights to the subclass measurements
Severini, 2015
Adjusting Sport Statistics
Y = q1Y1 + q2Y2 + q3Y3 ……. + qmYm
Y = measurement of interest
qx = subclass weights
Yx = subclass measurements
m = number of subclass measurements
Severini, 2015
Adjusting Sport Statistics
Y* = p1Y1 + p2Y2 + p3Y3 ……. + pmYm
Y* = predicted measurement of interest under circumstances that weights are px
px = adjusted weights
Yx = subclass measurements
m = number of subclass measurements
Severini, 2015
Adjusting Sport Statistics
• This type of adjustment is appropriate for rates or counts
• Not appropriate for ratios of summary statistics
• Some metrics are already adjusted
• Choose standard weights with objectivity
• Might not be realistic as they represent a hypothetical scenario
Severini, 2015
Adjusting Sport Statistics
• e.g. Canberra United FC Goal Scoring in the W-League
Shots Goals Goal Prob.
All 215 23 0.107
0-10m 46 12 0.261
11 - 20m 78 10 0.128
21 - 30m 80 0 0
> 30m 11 1 0.091
Adjusting Sport Statistics
• e.g. Melbourne City Goal Scoring in the W-League
Shots Goals Goal Prob.
All 192 30 0.156
0-10m 45 11 0.244
11 - 20m 92 17 0.185
21 - 30m 48 1 0.021
> 30m 7 1 0.143
Adjusting Sport Statistics
G = d10G10 + d20 G20 + d30 G30 + dgt30Ggt30
G = Goal Probability
dx = proportion of shots from each distance threshold
Gx = Goal Probability for each distance threshold
Adjusting Sport Statistics
Canberra United
0.107 = (0.214)(0.261) + (0.363)(0.128) + (0.372)(0) + (0.051)(0.091)
Melbourne City
0.156 = (0.234)(0.244) + (0.479)(0.185) + (0.25)(0.021) + (0.036)(0.143)
Adjusting Sport Statistics
Canberra United adj. for Melbourne City Standard
0.126 = (0.234)(0.261) + (0.479)(0.128) + (0.25)(0) + (0.036)(0.091)
Melbourne City
0.156 = (0.234)(0.244) + (0.479)(0.185) + (0.25)(0.021) + (0.036)(0.143)
Adjusting Sport Statistics
• e.g. Goal Scoring in the W-League
Shots Goals Prob.
All 921 90 0.098
0-10m 166 36 0.217
11 - 20m 390 44 0.113
21 - 30m 314 4 0.013
> 30m 51 6 0.118
Adjusting Sport Statistics
G = d10G10 + d20 G20 + d30 G30 + dgt30Ggt30
0.098 = (0.180)(0.217) + (0.423)(0.113) + (0.341)(0.013) + (0.055)(0.118)
Adjusting Sport Statistics
Canberra United Adj. for League Standard
0.106 = (0.180)(0.261) + (0.423)(0.128) + (0.341)(0) + (0.055)(0.091)
Melbourne City Adj. for League Standard
0.137 = (0.180)(0.244) + (0.423)(0.185) + (0.341)(0.021) + (0.055)(0.143)
Z-scores
• Can be used to standardise and compare performances
• Expresses an individual performance value as the number of standard
deviations it is above or below the mean
Z-scores
! = # − %&
Where # = individual value% = mean& = standard deviation
Z-scores
Z-scores
> (38.5 - mean(lakers$FG.)) / sd(lakers$FG.)
[1] -1.543456
> (53.0 - mean(lakers$FG.)) / sd(lakers$FG.)
[1] 1.304611
Z-scores
> scale(lakers$FG.)
[,1]
[1,] -0.8756333
[2,] -0.7774241
[3,] -1.5434559
[4,] 0.6171466
[5,] 1.3046111 ## first 5 rows of output ##
Z-scores
> library(dplyr)
lakers <- mutate(lakers, FG.z = scale(FG.))
Standard Normal Distribution
Standard Normal Distribution
68%
Standard Normal Distribution
95%
Standard Normal Distribution
99%
Standard Normal Distribution
a P(- a < Z < a)
0.5 0.383
1 0.683
1.5 0.866
2 0.954
3 0.997
Normal Distribution
> plot(density(lakers$FG.)
> mean(lakers$FG.)
[1] 46.358
> sd(lakers$FG.)
[1] 5.091172
Normal Distribution
> plot(density(lakers$FG.)
> mean(lakers$FG.)
[1] 46.358
> sd(lakers$FG.)
[1] 5.091172
68%
Comparing Performances
Player Year Receiving Yards
Calvin Johnson 2012 1964
Marvin Harrison 2002 1722
Jerry Rice 1995 1848
John Jefferson 1980 1340
Otis Taylor 1971 1110
Raymond Berry 1960 1289
Severini, 2015
Top Receiving Yard Performances in Different NFL Eras
Comparing Performances
• Direct comparison might be misleading as the passing game has changed
• We can account for these differences by comparing each receiver to other
receivers that played in that season
Severini, 2015
Comparing Performances
Year Mean SD Z-Score
2012 269.2 329.6 5.142
2002 291.2 325.8 4.392
1995 288.4 342.8 3.600
1980 280.5 278.4 3.806
1971 224.3 223.8 3.958
1960 234.0 263.9 4.032
Severini, 2015
Mean and SD for all players in each year
Adjusting Sport Statistics
Player Calvin Johnson Marvin Harrison
Year 2012 2002
Receiving Yards 1964 1722
Mean 269.2 291.2
SD 329.6 325.8
Z-score 5.142 4.392
Severini, 2015
Adjusting Sport StatisticsHow would have Marvin Harrison performed in 2012?
Severini, 2015
! = # − 269.2329.6
2012 Mean
2012 SD
4.392 = # − 269.2329.6
Harrison 2002 Z-score
Adjusting Sport StatisticsHow would have Marvin Harrison performed in 2012?
Severini, 2015
4.392 = ' − 269.2329.6
' = 4.392 329.6 + 269.2
' = 1717 (adj. yards for 2012)
Adjusting Sport Statistics
Player Year Yards Adj. Yards
Calvin Johnson 2012 1964 1964
Marvin Harrison 2002 1722 1717
Jerry Rice 1995 1848 1456
John Jefferson 1980 1340 1524
Otis Taylor 1971 1110 1574
Raymond Berry 1960 1289 1598
Severini, 2015
Adjusted Receiving Yards
Summary
• Law of total probability can be used to adjust and predict sport statistics
based on a given scenario
• Z-scores can be used to compare performances to the mean performance
• Z-scores can be used to adjust sport statistics based on a given mean and
sd
• Consider what mean and sd you use as your reference