Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Using player abilities to predict football
Gavin Whitaker
ISBIS: Data Science in Industry, July 2018
Gavin Whitaker ISBIS: July 2018 1/39
Who are Stratagem?
B A data science and trading company - focus on sports
B Football/tennis/basketball
B Combine state-of-the-art analytics, data and expert analysis tofind cutting edge solutions
B Aim to utilise the full power of the modern AI toolkit
B Offer:
Betting insightsModellingBet pricesTrading services
Gavin Whitaker ISBIS: July 2018 2/39
The problem
Aim
B Wish to determine the ability of a given player in a specificevent, e.g. passing, scoring a goal etc
B Use these abilities to help predictions, Over/Under or Asianhandicap markets
Possible questions
B Can we rate/rank players on their ability in an event?
B How important is a player to a team?
B What happens if that player is missing from the team?
B What happens if you add a player to a team?
B What happens if you swap player x for player y?
Gavin Whitaker ISBIS: July 2018 3/39
Example fixture
Japan vs Poland - 28/6/18
Outcome Market Model
Home winAway winDraw
Gavin Whitaker ISBIS: July 2018 4/39
Example fixture
Japan vs Poland - 28/6/18
Outcome Market Model
Home win 25.7%Away win 43.7%Draw 30.6%
Gavin Whitaker ISBIS: July 2018 5/39
Example fixture
Japan vs Poland - 28/6/18
Outcome Market Model
Home win 37.5% 25.7%Away win 32.0% 43.7%Draw 30.5% 30.6%
Final score: Japan 0 - 1 Poland
Gavin Whitaker ISBIS: July 2018 6/39
Example fixture
Japan vs Poland - 28/6/18
Outcome Market Model
Home win 37.5% 25.7%Away win 32.0% 43.7%Draw 30.5% 30.6%
Final score: Japan 0 - 1 Poland
Gavin Whitaker ISBIS: July 2018 6/39
The data
B Touch data
B 2013/2014 - 2016/2017 Premier league seasons
B Roughly 2.4 million events in total
B ≈ 1600 events per game
Gavin Whitaker ISBIS: July 2018 7/39
Liverpool vs Stoke, 17th August 2013
Pas
s
Bal
lRec
over
y
Aer
ial
Cle
aran
ce
Tac
kle
Fou
l
Tak
eOn
Dis
poss
esse
d
Bal
lTou
ch
Cor
nerA
war
ded
Inte
rcep
tion
Sav
e
Sav
edS
hot
Cha
lleng
e
Mis
sedS
hots
Kee
perP
icku
p
Offs
ideG
iven
Offs
ideP
rovo
ked
Offs
ideP
ass
Cla
im
Err
or
Sho
tOnP
ost
Pun
ch
Cro
ssN
otC
laim
ed
Goa
l
Event type
0
200
400
600
800
1000
Fre
quen
cy
Gavin Whitaker ISBIS: July 2018 8/39
Liverpool vs Stoke, 17th August 2013 (pass removed)
Bal
lRec
over
y
Aer
ial
Cle
aran
ce
Tac
kle
Tak
eOn
Fou
l
Dis
poss
esse
d
Bal
lTou
ch
Cor
nerA
war
ded
Inte
rcep
tion
Sav
e
Sav
edS
hot
Cha
lleng
e
Mis
sedS
hots
Kee
perP
icku
p
Offs
ideP
ass
Offs
ideG
iven
Offs
ideP
rovo
ked
Sho
tOnP
ost
Err
or
Cla
im
Cro
ssN
otC
laim
ed
Pun
ch
Goa
l
Event type
0
20
40
60
80
100
Fre
quen
cy
Gavin Whitaker ISBIS: July 2018 9/39
Stop Control Disruption QuestionableCard Aerial BlockedPass CornerAwardedEnd BallRecovery Challenge CrossNotClaimed
FormationChange BallTouch Claim KeeperSweeperFormationSet ChanceMissed Clearance ShieldBallOppOffsideGiven Dispossessed InterceptionPenaltyFaced Error KeeperPickup
Start Foul OffsideProvokedSubstitutionOff Goal PunchSubstitutionOn GoodSkill Save
MissedShots SmotherOffsidePass Tackle
PassSavedShot
ShotOnPostTakeOn
Gavin Whitaker ISBIS: July 2018 10/39
The model
B K matches, numbered k = 1, . . . ,K
B Set of teams in fixture k is Tk, with THk and TAk ,Tk = {THk , TAk }
B P is the set of all players, P jk ∈ P is the subset of players whoplay for team j in fixture k
B Consider how players’ abilities over different events interact,group events to create meaningful interactions
B For simplicity lets consider 2 events, e.g. “Pass” and“AntiPass”
B Denote the set of events as E = {e1, e2}
Gavin Whitaker ISBIS: July 2018 11/39
B Xei,k as the number of occurrences of event e, by player i
(for team j), in match k
B We construct a Poisson process
Xei,k ∼ Pois
(ηei,kτi,k
)where
ηei,k = exp
∆ei + τi,k
λe1 ∑i′∈P j
k
∆ei′ − λe2
∑i′∈PTk\j
k
∆E\ei′
+(δTH
k ,j
)γe
Gavin Whitaker ISBIS: July 2018 12/39
B ∆ei is the latent ability for player i for event e
B δr,s is the Kronecker delta
B τi,k is the fraction of time player i spent on the pitch inmatch k, τi,k ∈ [0, 1]
B γe is the home effect for event e
B λe1 is the impact of a player’s own team
B λe2 describing the opposition’s ability to stop the player/otherteam
B We impose the constraint that the λs must be positive
Gavin Whitaker ISBIS: July 2018 13/39
∆e11 ∆e1
2. . .
∆e111 ∆e2
12 ∆e213
. . .∆e2
22
log (ηe11 )
λ1
11∑i=1
∆e1i −λ2
22∑i=12
∆e2i
γe1
For simplicity we assume only 11 players on each team and dropthe time dependence
Gavin Whitaker ISBIS: July 2018 14/39
The log-likelihood is
` =∑e∈E
K∑k=1
∑j∈Tk
∑i∈P j
k
Xei,k log
(ηei,kτi,k
)− ηei,kτi,k − log
(Xei,k !)
B Large number of players → large number of parameters(MCMC not feasible)
B Appeal to variational inference techniques combined withautomatic differentiation
B Can utilise a prior for those players with few datapoints/minutes played
Gavin Whitaker ISBIS: July 2018 15/39
Variational inference
B See Blei et al. (2017), Kucukelbir et al. (2016), Duvenaudand Adams (2015) and Chapter 19 of Goodfellow et al. (2016)
B Specify a variational family of densities over the latentvariables
B Latent variables - ν
B Aim to find best candidate approximation q(ν)
B Do this by maximising the evidence lower bound (ELBO)
B ELBO - close relations to the Kullback-Leibler divergence
ELBO (ν) = Eν [log {π (ν, x)}]− Eν [log {q (ν)}]
B We consider the mean-field variational family
B The latent variables are assumed to be mutually independent
Gavin Whitaker ISBIS: July 2018 16/39
B Let ν = ∆, and set
q (∆ei ) ∼ N
(µ∆e
i, σ2
∆ei
),
where
q (∆) =∏e∈E
∏j∈Tk
∏i∈P j
k
q (∆ei )
B Aim - find suitable candidate values for µ∆ei
and σ∆ei, ∀i,∀e.
These are the variational parameters
B Take (λe1, λe2, γ
e)T to be fixed parameters
B Prior - π(∆ei ) ∼ N(−2, 22)
B Fit using automatic differentiation - Python packageautograd (Maclaurin et al., 2015)
B Minimise (negative ELBO) using ADAM
Gavin Whitaker ISBIS: July 2018 17/39
Application - 2013/2014 season
B Initially look at the 2013/2014 English Premier Leaguek = 1, . . . , 380j ∈ Tk where Tk consists of a subset of {1, . . . , 20}i ∈ P j
k where P jk is a subset of P = {1, . . . , 544}
B The full model has 2182 parameters
B Compare “Goal” and “GoalStop”
Gavin Whitaker ISBIS: July 2018 18/39
Within sample predictive distributions
Goal
0 5 10 15 20 25Player
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Occ
urre
nces
GoalStop
0 5 10 15 20 25Player
0
5
10
15
20
25
30
Occ
urre
nces
Red: model combinations of ηei,k. Blue: observed. The red dotted barsshow the 95% prediction interval for each ηei,k. The black line separatesthe players from the two teams
Gavin Whitaker ISBIS: July 2018 19/39
Goal - marginal posterior variational densities
Sturridge
6 4 2 0 2 4
q( Goali )
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Den
sity
Reed
6 4 2 0 2 4
q( Goali )
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Den
sity
Red-dotted: prior. Blue-solid: posterior
Gavin Whitaker ISBIS: July 2018 21/39
Goal - top 10
Rank Player2.5% Standard
ObservedObserved Rank
quantile deviation rank difference1 Suarez 0.508 0.184 31 1 02 Sturridge 0.176 0.225 21 2 03 Aguero 0.147 0.250 17 4 +14 Y. Toure -0.043 0.224 20 3 -15 Rooney -0.056 0.243 17 5 06 Dzeko -0.065 0.249 16 8 +27 van Persie -0.136 0.289 12 15 +88 Remy -0.230 0.271 14 11 +39 Bony -0.257 0.252 16 7 -2
10 Rodriguez -0.354 0.263 15 10 0
Gavin Whitaker ISBIS: July 2018 22/39
GoalStop - top 10
Rank Player2.5% Standard
ObservedObserved Rank
quantile deviation rank difference1 Mulumbu 2.575 0.040 631 1 02 Kallstrom 2.553 0.177 33 405 +4033 Mannone 2.528 0.044 508 12 +94 Yacob 2.510 0.053 359 43 +395 Tiote 2.474 0.044 517 8 +36 Lewis 2.446 0.213 23 436 +4307 Palacios 2.441 0.101 100 286 +2798 Jedinak 2.420 0.041 603 2 -69 Ruddy 2.411 0.041 600 3 -6
10 Arteta 2.409 0.048 431 21 +11
Gavin Whitaker ISBIS: July 2018 23/39
Application - past/future
B Now look at 2013/2014 and 2014/2015 English PremierLeague seasons
B Initial train using complete 2013/2014 season
B Introduce games in blocks of 80 games
B Fit on all the past to predict the future
13/14 14/15
... etcfit predict
Gavin Whitaker ISBIS: July 2018 24/39
Goal: 13/14 - 14/15
0 1 2 3 4Block
10
8
6
4
2
0
2
Goali
Costa
Sanchez
Aguero
Defoe
Kane
Austin
Gavin Whitaker ISBIS: July 2018 26/39
Prediction
B Use these ∆s as covariates in a hierarchical Bayesian model,comparing against a baseline model found in Baio andBlangiardo (2010)
µatt σatt µdef σdef
home atth defa atta defh
θh θa
yh ya
Gavin Whitaker ISBIS: July 2018 27/39
B Use these ∆s as covariates in a hierarchical Bayesian model,comparing against a baseline model found in Baio andBlangiardo (2010)
µatt σatt µdef σdef
home atth defa atta defh
f(∆)h f(∆)aθh θa
yh ya
Gavin Whitaker ISBIS: July 2018 28/39
B y ≡ (yh, ya) = (home goals, away goals)
yh|θh ∼ Pois (θh) ,
ya|θa ∼ Pois (θa) ,
log (θh) = home+ atth + defa,
log (θa) = atta + defh
B att∗ ∼ N(µa, σ
2a
)and def∗ ∼ N
(µd, σ
2d
)B Priors
(µa, µd) ∼ N(0, 102
), independently
(σa, σd) ∼ Inv-Gamma(0.1, 0.1), independently
home ∼ N(0, 102
)Gavin Whitaker ISBIS: July 2018 29/39
B Let p∗ be the players which start a game (predicted line up)
f(∆)h =∑i′∈p∗h
µ∆ei′−∑i′∈p∗a
µ∆
E\ei′
f(∆)a =∑i′∈p∗a
µ∆ei′−∑i′∈p∗h
µ∆
E\ei′
where p∗h is the home team and p∗a the away team
B Fit the model using STAN (HMC)
B Fit the model on the past, before predicting on the next set offixtures
B Use output from hierarchical Bayesian model (θ) to formpredictions, e.g. out-of-sample Pr(goals > 2.5)
Gavin Whitaker ISBIS: July 2018 30/39
B Use ∆s for the abilities
GoalShotsChained Event (our own devising)
B Use 2013/2014 for training only and predict from 2014/2015onwards using block structure (to end of 2016/2017 season)
B Only predict matches where we have already observed bothteams involved (only affects 1st block of each season)
B Bet £100 stake on each game
B Predict
Over/under (OU)Asian handicap (AH)
Gavin Whitaker ISBIS: July 2018 31/39
England - Premier League
OU
201410 201502 201506 201510 201602 201606 201610 201702 201706Time
0.2
0.0
0.2
0.4
0.6
0.8
Cum
ulat
ive
retu
rn
Baseline
AH
201410 201502 201506 201510 201602 201606 201610 201702 201706Time
0.2
0.0
0.2
0.4
0.6
Cum
ulat
ive
retu
rn
Baseline
OU, ∆: £6914.09, Baseline: £-1189.51AH, ∆: £5016.87, Baseline: £-3193.15
Gavin Whitaker ISBIS: July 2018 32/39
Germany - Bundesliga
OU
201410 201502 201506 201510 201602 201606 201610 201702 201706Time
0.4
0.3
0.2
0.1
0.0
0.1
0.2
0.3
Cum
ulat
ive
retu
rn
Baseline
AH
201410 201502 201506 201510 201602 201606 201610 201702 201706Time
0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Cum
ulat
ive
retu
rn
Baseline
OU, ∆: £2502.33, Baseline: £-3335.97AH, ∆: £6565.97, Baseline: £5749.04
Gavin Whitaker ISBIS: July 2018 33/39
Spain - La Liga
OU
201410 201502 201506 201510 201602 201606 201610 201702 201706Time
0.0
0.1
0.2
0.3
Cum
ulat
ive
retu
rn
Baseline
AH
201410 201502 201506 201510 201602 201606 201610 201702 201706Time
0.1
0.0
0.1
0.2
0.3
Cum
ulat
ive
retu
rn
Baseline
OU, ∆: £859.31, Baseline: £2493.50AH, ∆: £1927.65, Baseline: £-1079.97
Gavin Whitaker ISBIS: July 2018 34/39
Future work
B See if we really are capturing the style of play in each leagueusing Goal, Shots and Chained Event
B What can we introduce to more accurately capture theleagues, e.g. other abilities?
B Look at down-weighting the past
B Possibilities towards a spatial model
B Hope to combine - finding the links between players and howthe abilities interact over the links
Gavin Whitaker ISBIS: July 2018 35/39
Eriksen assist locations under a Gaussian mixture model in the2016/2017 English Premier League, 1st 15 minutes of games
0 100 200 300y
100
0
100
x
0.00000
0.00001
0.00002
0.00003
0.00004
0.00005
0.00006
0.00007
0.00008
Gavin Whitaker ISBIS: July 2018 36/39
Assist location maps for 2 teams using data until 1st March in2016/2017 season
Chelsea
0 50 100 150 200 250 300y
100
50
0
50
100
x
Burnley
0 50 100 150 200 250 300y
100
50
0
50
100
x
0.00000 0.00001 0.00002 0.00003 0.00004 0.00005
Gavin Whitaker ISBIS: July 2018 37/39
References
B Baio, G. and Blangiardo, M. Bayesian hierarchical model for the predictionof football results. Journal of Applied Statistics, 37 (2) 253-264,2010
B Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. Variational inference: Areview for statisticians. Journal of the American Statistical Association,2017
B Duvenaud, D. and Adams, R. P. Black-box stochastic variationalinference in five lines of python. NIPS Workshop on Black-box Learningand Inference, 2015
B Franks, A., Miller, A., Bornn, L. and Goldsberry, K. Characterizing thespatial structure of defensive skill in professional basketball. The Annalsof Applied Statistics, 9 (1) 94-121, 2015
B Goodfellow, I., Bengio, Y. and Courville, A. Deep Learning. MIT Press,2016.
B Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. and Blei, D. M.Automatic differentiation variational inference. Journal of MachineLearning Research, 18 (14) 1-45, 2017
Gavin Whitaker ISBIS: July 2018 38/39
B Maclaurin, D., Duvenaud, D. and Adams, R. P. Autograd: Effortlessgradients in numpy. ICML 2015 AutoML Workshop, 2015
B Miller, A., Bornn, L., Adams, R. and Goldsberry, K. Factorized pointprocess intensities: A spatial analysis of professional basketball.Proceedings of the 31st International Conference on Machine Learning -Volume 32, 235-243, 2014
B Saul, L. and Jordan, M. I. Exploiting tractable substructures inintractable networks. Advances in Neural Information Processing Systems8, MIT Press, 486-492, 1996
B Whitaker, G. A., Silva, R., Edwards, D. A Bayesian inference approach fordetermining player abilities in soccer. arXiv preprint,https://arxiv.org/abs/1710.00001.pdf, 2017
B Whitaker, G. A., Silva, R., Edwards, D. Modeling goal chances in soccer:a Bayesian inference approach. arXiv preprint,https://arxiv.org/abs/1802.08664.pdf, 2018
Gavin Whitaker ISBIS: July 2018 39/39