The Betting Machine - COnnecting REpositories The Betting Machine. Using in-depth match statistics to

  • View
    0

  • Download
    0

Embed Size (px)

Text of The Betting Machine - COnnecting REpositories The Betting Machine. Using in-depth match statistics...

  • The Betting Machine Using in-depth match statistics to compute

    future probabilities of football match outcomes

    using the Gibbs sampler

    Martin Belgau Ellefsrød

    Master of Science in Computer Science

    Supervisor: Helge Langseth, IDI

    Department of Computer and Information Science

    Submission date: June 2013

    Norwegian University of Science and Technology

  • Abstract

    Football is one of the most, if not the most, popular sporting games in the world, both played and watched by millions of people from all over the world almost daily, certainly weekly. Though most of those who place weekly bets on match outcomes have made up their minds on the abilities on competing teams, many have nevertheless attempted to as- sess the abilities of sporting teams using different statistical approaches, assigning objective, quantitative values to each team. From that standing point, one can then try to predict the future results of games. This paper researches the existing methods used by Maher (1982) and Dixon & Coles (1997) on modeling team strengths, and how these models are used for prediction.

    The study then proceeds to compare the two methods of Maher (1982) and Dixon & Coles (1997) by experimenting with the models, finding that the latter seems to provide the most promising results. Tests are run by constructing the models and collecting empirical evidence on the accuracy on the models when using them to bet on matches.

    We then continue with constructing our own model, which utilizes more detailed data from the current season’s football matches, retrieved from several football and betting sites on the internet, and compare our results with how the older models performed on the same season.

    Our study finds that the current data we were able to retrieve does not significantly increase the return of investments when betting on matches over the course of a season. Though our model performs slightly better than the two methods of Maher(1982) and Dixon & Coles(1997), it is not able to perform better than the bookmakers it is betting against.

    The study is concluded by a section on what further work should be done to attempt to improve the models, focusing on using extensive data on matches that we did not manage to find, such as where on the pitch most passes were made, or where shots where fired from, and whether important players were available.

  • ii

  • Preface

    This project was done by a master student at the Norwegian University of Science and Technology. In the latter stages of my studies, I have selected Game Technology as my specialization, but have also completed several courses required for Artificial intelligence students, as I also have an interest in this field. I have always had a deep interest in football statistics and the appliance of this to betting strategies. As a consequence, I was drawn to this project, as it combines two fields I find very interesting. The project offered freedom in regards to how to attack the problem, but as my supervisor Helge Langseth had already constructed a framework for working with Gibbs sampling in Matlab, I chose to continue development of this framework, using this method.

    iii

  • iv

  • Acknowledgments

    I would like to thank my supervisor Helge Langseth for his immense expertise in the fields of football betting, statistical analysis and AI methods; without his help over the course of this project, it would not be what it is today. I would also like to thank my fiancée, Nikoline, who has been so very supportive these past 7 years.

    v

  • vi

  • Contents

    1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Research Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Background Theory and Motivation 7 2.1 Background Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.1 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2 Assessment of Maher, Dixon & Coles . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Using the Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Introducing the bivariate Poisson model . . . . . . . . . . . . . . . . . . . 10 2.2.3 The Home Ground Advantage and Varying Team Form . . . . . . . . . 11 2.2.4 Altering the Poisson Distribution by Inflation . . . . . . . . . . . . . . . 12 2.2.5 Comparision of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.3 Technologies and Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 JAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.4 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.5 WampServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.6 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.7 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3 System Architecture 17 3.1 The internet Crawler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.1.1 Important Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.2 Important PHP-scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Taking it step by step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.2 The Betting Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.1 Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 GameList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.3 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.4 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.5 Bookie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.6 Footy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.7 readData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    vii

  • 4 Comparing the Models 33 4.1 Maher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Dixon and Coles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.3.1 Experimental Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.2 Betting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    5 Constructing a Model 49 5.1 The statistics available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 Using the coefficient of determination . . . . . . . . . . . . . . . . . . . . . . . . 50

    5.2.1 The predictive nature of goals . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.2 Using other variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2.3 Choosing variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    5.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    6 Experiments and Results 59 6.1 Experimental Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    6.1.1 Fixed Bets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.1.2 Fixed Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.1.3 Only Favourites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    7 Evaluation and Conclusion 69 7.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    7.2.1 Useful in-depth data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.2.2 Improvement of model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    7.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.4 Continuation of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    A 75 A.1 League Positions 2011/2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 A.2 Maher model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A.3 Dixon and Coles model implementation . . . . . . . . . . . . . . . . . . . . . . . 78 A.4 Our model implementation . . . . . . . . . . . . . . . . . . . . . .