40
Using player abilities to predict football Gavin Whitaker ISBIS: Data Science in Industry, July 2018 Gavin Whitaker ISBIS: July 2018 1/39

Gavin Whitaker ISBIS: Data Science in Industry, July 2018

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Using player abilities to predict football

Gavin Whitaker

ISBIS: Data Science in Industry, July 2018

Gavin Whitaker ISBIS: July 2018 1/39

Who are Stratagem?

B A data science and trading company - focus on sports

B Football/tennis/basketball

B Combine state-of-the-art analytics, data and expert analysis tofind cutting edge solutions

B Aim to utilise the full power of the modern AI toolkit

B Offer:

Betting insightsModellingBet pricesTrading services

Gavin Whitaker ISBIS: July 2018 2/39

The problem

Aim

B Wish to determine the ability of a given player in a specificevent, e.g. passing, scoring a goal etc

B Use these abilities to help predictions, Over/Under or Asianhandicap markets

Possible questions

B Can we rate/rank players on their ability in an event?

B How important is a player to a team?

B What happens if that player is missing from the team?

B What happens if you add a player to a team?

B What happens if you swap player x for player y?

Gavin Whitaker ISBIS: July 2018 3/39

Example fixture

Japan vs Poland - 28/6/18

Outcome Market Model

Home winAway winDraw

Gavin Whitaker ISBIS: July 2018 4/39

Example fixture

Japan vs Poland - 28/6/18

Outcome Market Model

Home win 25.7%Away win 43.7%Draw 30.6%

Gavin Whitaker ISBIS: July 2018 5/39

Example fixture

Japan vs Poland - 28/6/18

Outcome Market Model

Home win 37.5% 25.7%Away win 32.0% 43.7%Draw 30.5% 30.6%

Final score: Japan 0 - 1 Poland

Gavin Whitaker ISBIS: July 2018 6/39

Example fixture

Japan vs Poland - 28/6/18

Outcome Market Model

Home win 37.5% 25.7%Away win 32.0% 43.7%Draw 30.5% 30.6%

Final score: Japan 0 - 1 Poland

Gavin Whitaker ISBIS: July 2018 6/39

The data

B Touch data

B 2013/2014 - 2016/2017 Premier league seasons

B Roughly 2.4 million events in total

B ≈ 1600 events per game

Gavin Whitaker ISBIS: July 2018 7/39

Liverpool vs Stoke, 17th August 2013

Pas

s

Bal

lRec

over

y

Aer

ial

Cle

aran

ce

Tac

kle

Fou

l

Tak

eOn

Dis

poss

esse

d

Bal

lTou

ch

Cor

nerA

war

ded

Inte

rcep

tion

Sav

e

Sav

edS

hot

Cha

lleng

e

Mis

sedS

hots

Kee

perP

icku

p

Offs

ideG

iven

Offs

ideP

rovo

ked

Offs

ideP

ass

Cla

im

Err

or

Sho

tOnP

ost

Pun

ch

Cro

ssN

otC

laim

ed

Goa

l

Event type

0

200

400

600

800

1000

Fre

quen

cy

Gavin Whitaker ISBIS: July 2018 8/39

Liverpool vs Stoke, 17th August 2013 (pass removed)

Bal

lRec

over

y

Aer

ial

Cle

aran

ce

Tac

kle

Tak

eOn

Fou

l

Dis

poss

esse

d

Bal

lTou

ch

Cor

nerA

war

ded

Inte

rcep

tion

Sav

e

Sav

edS

hot

Cha

lleng

e

Mis

sedS

hots

Kee

perP

icku

p

Offs

ideP

ass

Offs

ideG

iven

Offs

ideP

rovo

ked

Sho

tOnP

ost

Err

or

Cla

im

Cro

ssN

otC

laim

ed

Pun

ch

Goa

l

Event type

0

20

40

60

80

100

Fre

quen

cy

Gavin Whitaker ISBIS: July 2018 9/39

Stop Control Disruption QuestionableCard Aerial BlockedPass CornerAwardedEnd BallRecovery Challenge CrossNotClaimed

FormationChange BallTouch Claim KeeperSweeperFormationSet ChanceMissed Clearance ShieldBallOppOffsideGiven Dispossessed InterceptionPenaltyFaced Error KeeperPickup

Start Foul OffsideProvokedSubstitutionOff Goal PunchSubstitutionOn GoodSkill Save

MissedShots SmotherOffsidePass Tackle

PassSavedShot

ShotOnPostTakeOn

Gavin Whitaker ISBIS: July 2018 10/39

The model

B K matches, numbered k = 1, . . . ,K

B Set of teams in fixture k is Tk, with THk and TAk ,Tk = {THk , TAk }

B P is the set of all players, P jk ∈ P is the subset of players whoplay for team j in fixture k

B Consider how players’ abilities over different events interact,group events to create meaningful interactions

B For simplicity lets consider 2 events, e.g. “Pass” and“AntiPass”

B Denote the set of events as E = {e1, e2}

Gavin Whitaker ISBIS: July 2018 11/39

B Xei,k as the number of occurrences of event e, by player i

(for team j), in match k

B We construct a Poisson process

Xei,k ∼ Pois

(ηei,kτi,k

)where

ηei,k = exp

∆ei + τi,k

λe1 ∑i′∈P j

k

∆ei′ − λe2

∑i′∈PTk\j

k

∆E\ei′

+(δTH

k ,j

)γe

Gavin Whitaker ISBIS: July 2018 12/39

B ∆ei is the latent ability for player i for event e

B δr,s is the Kronecker delta

B τi,k is the fraction of time player i spent on the pitch inmatch k, τi,k ∈ [0, 1]

B γe is the home effect for event e

B λe1 is the impact of a player’s own team

B λe2 describing the opposition’s ability to stop the player/otherteam

B We impose the constraint that the λs must be positive

Gavin Whitaker ISBIS: July 2018 13/39

∆e11 ∆e1

2. . .

∆e111 ∆e2

12 ∆e213

. . .∆e2

22

log (ηe11 )

λ1

11∑i=1

∆e1i −λ2

22∑i=12

∆e2i

γe1

For simplicity we assume only 11 players on each team and dropthe time dependence

Gavin Whitaker ISBIS: July 2018 14/39

The log-likelihood is

` =∑e∈E

K∑k=1

∑j∈Tk

∑i∈P j

k

Xei,k log

(ηei,kτi,k

)− ηei,kτi,k − log

(Xei,k !)

B Large number of players → large number of parameters(MCMC not feasible)

B Appeal to variational inference techniques combined withautomatic differentiation

B Can utilise a prior for those players with few datapoints/minutes played

Gavin Whitaker ISBIS: July 2018 15/39

Variational inference

B See Blei et al. (2017), Kucukelbir et al. (2016), Duvenaudand Adams (2015) and Chapter 19 of Goodfellow et al. (2016)

B Specify a variational family of densities over the latentvariables

B Latent variables - ν

B Aim to find best candidate approximation q(ν)

B Do this by maximising the evidence lower bound (ELBO)

B ELBO - close relations to the Kullback-Leibler divergence

ELBO (ν) = Eν [log {π (ν, x)}]− Eν [log {q (ν)}]

B We consider the mean-field variational family

B The latent variables are assumed to be mutually independent

Gavin Whitaker ISBIS: July 2018 16/39

B Let ν = ∆, and set

q (∆ei ) ∼ N

(µ∆e

i, σ2

∆ei

),

where

q (∆) =∏e∈E

∏j∈Tk

∏i∈P j

k

q (∆ei )

B Aim - find suitable candidate values for µ∆ei

and σ∆ei, ∀i,∀e.

These are the variational parameters

B Take (λe1, λe2, γ

e)T to be fixed parameters

B Prior - π(∆ei ) ∼ N(−2, 22)

B Fit using automatic differentiation - Python packageautograd (Maclaurin et al., 2015)

B Minimise (negative ELBO) using ADAM

Gavin Whitaker ISBIS: July 2018 17/39

Application - 2013/2014 season

B Initially look at the 2013/2014 English Premier Leaguek = 1, . . . , 380j ∈ Tk where Tk consists of a subset of {1, . . . , 20}i ∈ P j

k where P jk is a subset of P = {1, . . . , 544}

B The full model has 2182 parameters

B Compare “Goal” and “GoalStop”

Gavin Whitaker ISBIS: July 2018 18/39

Within sample predictive distributions

Goal

0 5 10 15 20 25Player

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Occ

urre

nces

GoalStop

0 5 10 15 20 25Player

0

5

10

15

20

25

30

Occ

urre

nces

Red: model combinations of ηei,k. Blue: observed. The red dotted barsshow the 95% prediction interval for each ηei,k. The black line separatesthe players from the two teams

Gavin Whitaker ISBIS: July 2018 19/39

Goal

Model Observed

Goal

Stop

Gavin Whitaker ISBIS: July 2018 20/39

Goal - marginal posterior variational densities

Sturridge

6 4 2 0 2 4

q( Goali )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Den

sity

Reed

6 4 2 0 2 4

q( Goali )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Den

sity

Red-dotted: prior. Blue-solid: posterior

Gavin Whitaker ISBIS: July 2018 21/39

Goal - top 10

Rank Player2.5% Standard

ObservedObserved Rank

quantile deviation rank difference1 Suarez 0.508 0.184 31 1 02 Sturridge 0.176 0.225 21 2 03 Aguero 0.147 0.250 17 4 +14 Y. Toure -0.043 0.224 20 3 -15 Rooney -0.056 0.243 17 5 06 Dzeko -0.065 0.249 16 8 +27 van Persie -0.136 0.289 12 15 +88 Remy -0.230 0.271 14 11 +39 Bony -0.257 0.252 16 7 -2

10 Rodriguez -0.354 0.263 15 10 0

Gavin Whitaker ISBIS: July 2018 22/39

GoalStop - top 10

Rank Player2.5% Standard

ObservedObserved Rank

quantile deviation rank difference1 Mulumbu 2.575 0.040 631 1 02 Kallstrom 2.553 0.177 33 405 +4033 Mannone 2.528 0.044 508 12 +94 Yacob 2.510 0.053 359 43 +395 Tiote 2.474 0.044 517 8 +36 Lewis 2.446 0.213 23 436 +4307 Palacios 2.441 0.101 100 286 +2798 Jedinak 2.420 0.041 603 2 -69 Ruddy 2.411 0.041 600 3 -6

10 Arteta 2.409 0.048 431 21 +11

Gavin Whitaker ISBIS: July 2018 23/39

Application - past/future

B Now look at 2013/2014 and 2014/2015 English PremierLeague seasons

B Initial train using complete 2013/2014 season

B Introduce games in blocks of 80 games

B Fit on all the past to predict the future

13/14 14/15

... etcfit predict

Gavin Whitaker ISBIS: July 2018 24/39

Goal: 13/14 - 14/15

0 1 2 3 4Block

10

8

6

4

2

0

2

Goali

Gavin Whitaker ISBIS: July 2018 25/39

Goal: 13/14 - 14/15

0 1 2 3 4Block

10

8

6

4

2

0

2

Goali

Costa

Sanchez

Aguero

Defoe

Kane

Austin

Gavin Whitaker ISBIS: July 2018 26/39

Prediction

B Use these ∆s as covariates in a hierarchical Bayesian model,comparing against a baseline model found in Baio andBlangiardo (2010)

µatt σatt µdef σdef

home atth defa atta defh

θh θa

yh ya

Gavin Whitaker ISBIS: July 2018 27/39

B Use these ∆s as covariates in a hierarchical Bayesian model,comparing against a baseline model found in Baio andBlangiardo (2010)

µatt σatt µdef σdef

home atth defa atta defh

f(∆)h f(∆)aθh θa

yh ya

Gavin Whitaker ISBIS: July 2018 28/39

B y ≡ (yh, ya) = (home goals, away goals)

yh|θh ∼ Pois (θh) ,

ya|θa ∼ Pois (θa) ,

log (θh) = home+ atth + defa,

log (θa) = atta + defh

B att∗ ∼ N(µa, σ

2a

)and def∗ ∼ N

(µd, σ

2d

)B Priors

(µa, µd) ∼ N(0, 102

), independently

(σa, σd) ∼ Inv-Gamma(0.1, 0.1), independently

home ∼ N(0, 102

)Gavin Whitaker ISBIS: July 2018 29/39

B Let p∗ be the players which start a game (predicted line up)

f(∆)h =∑i′∈p∗h

µ∆ei′−∑i′∈p∗a

µ∆

E\ei′

f(∆)a =∑i′∈p∗a

µ∆ei′−∑i′∈p∗h

µ∆

E\ei′

where p∗h is the home team and p∗a the away team

B Fit the model using STAN (HMC)

B Fit the model on the past, before predicting on the next set offixtures

B Use output from hierarchical Bayesian model (θ) to formpredictions, e.g. out-of-sample Pr(goals > 2.5)

Gavin Whitaker ISBIS: July 2018 30/39

B Use ∆s for the abilities

GoalShotsChained Event (our own devising)

B Use 2013/2014 for training only and predict from 2014/2015onwards using block structure (to end of 2016/2017 season)

B Only predict matches where we have already observed bothteams involved (only affects 1st block of each season)

B Bet £100 stake on each game

B Predict

Over/under (OU)Asian handicap (AH)

Gavin Whitaker ISBIS: July 2018 31/39

England - Premier League

OU

2014­10 2015­02 2015­06 2015­10 2016­02 2016­06 2016­10 2017­02 2017­06Time

0.2

0.0

0.2

0.4

0.6

0.8

Cum

ulat

ive 

retu

rn

Baseline

AH

2014­10 2015­02 2015­06 2015­10 2016­02 2016­06 2016­10 2017­02 2017­06Time

0.2

0.0

0.2

0.4

0.6

Cum

ulat

ive 

retu

rn

Baseline

OU, ∆: £6914.09, Baseline: £-1189.51AH, ∆: £5016.87, Baseline: £-3193.15

Gavin Whitaker ISBIS: July 2018 32/39

Germany - Bundesliga

OU

2014­10 2015­02 2015­06 2015­10 2016­02 2016­06 2016­10 2017­02 2017­06Time

0.4

0.3

0.2

0.1

0.0

0.1

0.2

0.3

Cum

ulat

ive 

retu

rn

Baseline

AH

2014­10 2015­02 2015­06 2015­10 2016­02 2016­06 2016­10 2017­02 2017­06Time

0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Cum

ulat

ive 

retu

rn

Baseline

OU, ∆: £2502.33, Baseline: £-3335.97AH, ∆: £6565.97, Baseline: £5749.04

Gavin Whitaker ISBIS: July 2018 33/39

Spain - La Liga

OU

2014­10 2015­02 2015­06 2015­10 2016­02 2016­06 2016­10 2017­02 2017­06Time

0.0

0.1

0.2

0.3

Cum

ulat

ive 

retu

rn

Baseline

AH

2014­10 2015­02 2015­06 2015­10 2016­02 2016­06 2016­10 2017­02 2017­06Time

0.1

0.0

0.1

0.2

0.3

Cum

ulat

ive 

retu

rn

Baseline

OU, ∆: £859.31, Baseline: £2493.50AH, ∆: £1927.65, Baseline: £-1079.97

Gavin Whitaker ISBIS: July 2018 34/39

Future work

B See if we really are capturing the style of play in each leagueusing Goal, Shots and Chained Event

B What can we introduce to more accurately capture theleagues, e.g. other abilities?

B Look at down-weighting the past

B Possibilities towards a spatial model

B Hope to combine - finding the links between players and howthe abilities interact over the links

Gavin Whitaker ISBIS: July 2018 35/39

Eriksen assist locations under a Gaussian mixture model in the2016/2017 English Premier League, 1st 15 minutes of games

0 100 200 300y

100

0

100

x

0.00000

0.00001

0.00002

0.00003

0.00004

0.00005

0.00006

0.00007

0.00008

Gavin Whitaker ISBIS: July 2018 36/39

Assist location maps for 2 teams using data until 1st March in2016/2017 season

Chelsea

0 50 100 150 200 250 300y

100

50

0

50

100

x

Burnley

0 50 100 150 200 250 300y

100

50

0

50

100

x

0.00000 0.00001 0.00002 0.00003 0.00004 0.00005

Gavin Whitaker ISBIS: July 2018 37/39

References

B Baio, G. and Blangiardo, M. Bayesian hierarchical model for the predictionof football results. Journal of Applied Statistics, 37 (2) 253-264,2010

B Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. Variational inference: Areview for statisticians. Journal of the American Statistical Association,2017

B Duvenaud, D. and Adams, R. P. Black-box stochastic variationalinference in five lines of python. NIPS Workshop on Black-box Learningand Inference, 2015

B Franks, A., Miller, A., Bornn, L. and Goldsberry, K. Characterizing thespatial structure of defensive skill in professional basketball. The Annalsof Applied Statistics, 9 (1) 94-121, 2015

B Goodfellow, I., Bengio, Y. and Courville, A. Deep Learning. MIT Press,2016.

B Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. and Blei, D. M.Automatic differentiation variational inference. Journal of MachineLearning Research, 18 (14) 1-45, 2017

Gavin Whitaker ISBIS: July 2018 38/39

B Maclaurin, D., Duvenaud, D. and Adams, R. P. Autograd: Effortlessgradients in numpy. ICML 2015 AutoML Workshop, 2015

B Miller, A., Bornn, L., Adams, R. and Goldsberry, K. Factorized pointprocess intensities: A spatial analysis of professional basketball.Proceedings of the 31st International Conference on Machine Learning -Volume 32, 235-243, 2014

B Saul, L. and Jordan, M. I. Exploiting tractable substructures inintractable networks. Advances in Neural Information Processing Systems8, MIT Press, 486-492, 1996

B Whitaker, G. A., Silva, R., Edwards, D. A Bayesian inference approach fordetermining player abilities in soccer. arXiv preprint,https://arxiv.org/abs/1710.00001.pdf, 2017

B Whitaker, G. A., Silva, R., Edwards, D. Modeling goal chances in soccer:a Bayesian inference approach. arXiv preprint,https://arxiv.org/abs/1802.08664.pdf, 2018

Gavin Whitaker ISBIS: July 2018 39/39