Insert here your thesis’ task. - core.ac.uk betting market ine ciencies with machine learning Bc

  • View
    212

  • Download
    0

Embed Size (px)

Text of Insert here your thesis’ task. - core.ac.uk betting market ine ciencies with...

Insert here your thesis task.

Czech Technical University in Prague

Faculty of Electrical Engineering

Department of Computer Science

Masters thesis

Exploiting betting market inefficiencies

with machine learning

Bc. Ondrej Hubacek

Supervisor: Ing. Gustav Sourek

7th January 2017

Acknowledgements

I would like to express my gratitude to my supervisor Gustav Sourek for his invaluableconsultations and persisting optimism.

In addition, I thank Assistant Professor Erik Strumbelj from the University of Ljubljana,for provided data.

Computational resources were provided by the CESNET LM2015042 and the CERITScientific Cloud LM2015085, provided under the programme Projects of Large Re-search, Development, and Innovations Infrastructures

Last but not least, I would like to thank all of my family for their continued supportduring my studies.

Declaration

I hereby declare that the presented thesis is my own work and that I have cited all sourcesof information in accordance with the Guideline for adhering to ethical principles whenelaborating an academic final thesis.

I acknowledge that my thesis is subject to the rights and obligations stipulated by theAct No. 121/2000 Coll., the Copyright Act, as amended, in particular that the CzechTechnical University in Prague has the right to conclude a license agreement on theutilization of this thesis as school work under the provisions of Article 60(1) of the Act.

In Prague on 7th January 2017 . . . . . . . . . . . . . . . . . . . . .

Czech Technical University in PragueFaculty of Electrical Engineeringc 2017 Ondrej Hubacek. All rights reserved.

This thesis is school work as defined by Copyright Act of the Czech Republic. It has beensubmitted at Czech Technical University in Prague, Faculty of Electrical Engineering.The thesis is protected by the Copyright Act and its usage without authors permissionis prohibited (with exceptions defined by the Copyright Act).

Citation of this thesis

Hubacek, Ondrej. Exploiting betting market inefficiencies with machine learning. Mas-ters thesis. Czech Technical University in Prague, Faculty of Electrical Engineering,2017.

Abstrakt

Clem nas prace bylo najt zpusob, jak profitovat na sazkarskem trhu pomoc stro-joveho ucen. Zatmco vedecke prace se v minulosti hojne venovaly otazce, zda lzevytvorit model, ktery bude presnejs nez bookmaker, ziskovosti modelu nebyla venovanadostatecna pozornost. Tvrdme, ze zisk nen dusledkem pouze samotne presnosti. Mstotoho navrhujeme dekorelaci od predikc bookmakera, stale vsak nezapomnaje na presnostpredikce. Dale predstavujeme inovativn prstup k agregaci statistik hracu pomockonvolucnch neuronovych st. V neposledn rade ukazujeme vyuzit Modern Port-folio Theory, matematickeho ramce pro optimalizaci portfolia, v kontextu sazen proporovnavan ruznych sazecch strategi. Nami navrzene modely byly v souhrnu 15 sezonNBA ziskove. Nen nam znamo, ze by doposud byla zverejnena prace podobneho rozsahu.

Klcova slova prediktivn modelovan, sportovn analyza, sazkarske trhy, neuronoveste, basketbal

ix

Abstract

The goal of our work was to find a way to profit from betting market using machinelearning. While a lot of research has been devoted to determining whether a bookmakercan be beaten in terms of accuracy, surprisingly little attention was paid to evaluatingthe profitability of statistical models over the bookmaker. We argue that the profit isnot solely affected by the models accuracy. Instead, we encourage decorrelation of themodels output from the bookmakers predictions, while keeping the accuracy reasonablyhigh. Moreover, we introduce a novel approach of aggregating player-level statisticsusing convolutional neural networks. Last but not least, we illustrate the use of ModernPortfolio Theory, a mathematical framework for portfolio optimization, in the context ofbetting for comparison of diverse betting strategies. Our proposed models were able toachieve a positive profit totaling over the course of 15 NBA seasons. To the best of ourknowledge, no work of similar scale has been done and made publicly available before.

Keywords predictive modeling, sports analytics, betting markets, neural networks,basketball

x

Contents

1 Introduction 1

1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Betting Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 National Basketball Association . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Predictive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Literature Review 11

2.1 Betting Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Human vs Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Predictive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Analysis and Design 19

3.1 Gathering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Building Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Exploratory Data Analysis of Bookmakers Odds . . . . . . . . . . . . . . 20

3.4 Predictive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5 Betting Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Experiments 33

4.1 Simulation of the Betting Strategy . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Comparison with State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . 47

5 Conclusion 49

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Bibliography 51

A Features descriptions 55

xi

B Acronyms 61

C Contents of enclosed CD 63

xii

List of Figures

1.1 A schema of a neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 A schema of an ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Examples of activation functions . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Distribution of odds set by the bookmaker for home team (left) and forvisiting team (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Margin distribution over the seasons . . . . . . . . . . . . . . . . . . . . . . . 223.3 Margin distribution conditioned by the bookmakers odds . . . . . . . . . . . 223.4 Estimates of P (win,COURT |ODDS) (left) and P (win|ODDS) (right). . . . 233.5 PDFs of P (BOOK,COURT ) and P (BOOK,COURT |win) . . . . . . . . . . 243.6 Estimate of P (win|ODDS) from data with margin and with stripped margin 253.7 Risk-expected return space with random strategies and efficient frontier . . . 303.8 Comparison of the betting strategies in the risk-return space . . . . . . . . . 31

4.1 Accuracy of the ANN and LR models over seasons 20002014 . . . . . . . . . 374.2 Correlation between the ANN model and bookmakers predictions . . . . . . 374.3 Comparison of models profitability using the proposed (left) and fixbet (right)

betting strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.4 Analysis of predictions of models trained with sample weighting . . . . . . . . 404.5 Comparison of models profitability using proposed (left) and fixbet (right)

betting strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.6 Analysis of predictions of the ANN model trained with the decorrelation term

in loss function for varying values of c . . . . . . . . . . . . . . . . . . . . . . 424.7 Distribution of P (win|away, c, F ) for different values of c . . . . . . . . . . . 434.8 Analysis of predictions of the thresholded models . . . . . . . . . . . . . . . . 444.9 Accuracy of the ANN and ANN player models in seasons 20002014 . . . . . 454.10 Correlation between the ANN player model and bookmakers predictions . . 464.11 Comparison of models profitability using proposed (left) and fixbet (right)

betting strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

xiii

List of Tables

3.1 Meta-parameters of the used Models . . . . . . . . . . . . . . . . . . . . . . . 263.2 Comparison of betting strategies on simulated betting opportunities. . . . . . 31

4.1 Effects of the ground truth and models output distributions variances on theevaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Effects of different correlation levels between the true probabilities, the modeland the bookmaker on the evaluation metrics. . . . . . . . . . . . . . . . . . . 34

4.3 Number of games and bookmakers accuracy in each of the seasons . . . . . . 364.4 Mean results over the seasons of the LR and ANN model . . . . . . . . . . . 364.5 Results of ANN models trained with sample weighting . . . . . . . . . . . . . 394.6 Result of ANN model trained with the decorrelation term in loss function for

varying values of c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.7 Results of confidence thresholding of the team-level model . . . . . . . . . . . 424.8 Results of ANN player model . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.9 Result of ANN player model trained with decorrelation term in loss function

with different values of c . . . . . . . . . . . . . . . . . . . . . . .