Master Thesis Predicting Young Soccer Players Peak Potential …essay.utwente.nl/74786/1/Tahir_MS_EEMCS.pdf · Master Thesis Predicting Young Soccer Players Peak Potential with Optimal

Master Thesis Predicting Young Soccer Players Peak Potential with Optimal Age

AbdulRehman Tahir March 2018

2

PredictingYoungSoccerPlayersPeakPotentialwithOptimalAge

MasterThesisBy

AbdulRehmanTahir

1770314

MScComputerScienceDataScienceandSmartServices

Enschede,March2018Prof.Dr.Ir.RaymondVeldhuisDr.Ir.L.J.SpreeuwersDr.JanVanHaren

3

4

5

Acknowledgments Iamgratefultomanypeoplefortheircontribution,support,andconfidenceonmyself, which made my masters journey rousing and an experience in itself.Notably, I would like to express heartfelt gratitude to my advisors Prof. Dr.RaymondVeldhuis, andDr. LuukSpreewers for their continuousguidanceandevaluativefeedbackthroughoutthethesis.TheyalwayswelcomedmewheneverIhadaquestionaboutmyresearch.Theirsupportandtrustnotjusteducatemeto have a deep understanding of applied Machine Learning but it alsoencouragedmetodeveloparesearchattitude.ThesearethebestmentorsIhaveeverworkedwith.Ipayspecial thanks tomy friendsAmirKhan,AqibSaeedandAli Imam foralltheirsupportandmotivation.Finally, I would like to express my deepest and very sincere gratitude to myparentsandsiblings, forprovidingmeunflaggingsupport,encouragement,andunfailing prays at every step. This achievementwould not have been possiblewithoutthem.Thankyou.

6

7

Abstract Potential is the individualability toperform.Tomeasure thepotential,expertssystematically use many intelligence, ability, and competence based tests andmakes a composite score from their results, which is considered to be anindividual Potential.Most of the psychologists agree on the fact that potentialincreases anddecreaseswith age. In this research,wedevelopedanalgorithmthat can be used to predict the peak potential of young soccer players withoptimal age. We used different machine learning techniques from traditionalmethodstodeeplearningmethodstodevelopanalgorithmthatcanpredictthepeakpotentialofyoungsoccerplayersusingtheirplayingdatabetweentheageof15till19.WeusedLassoregressionandFeedForwardneuralnetworksasourbaseline models. We considered this problem as a time-series forecastingproblem or sequence prediction problem. Our proposedmodel is a variant ofrecurrent neural networks– LSTMs.We have found that LSTMs outperformedbaselinemodelsandperformedwithzeropredictionerroronthetestsetwhenusedwithplayer-specificmodels.

8

Contents Acknowledgments……………………………………………………………………5 Abstract ............................................................................................................. 7 Contents ............................................................................................................ 8 List of Figures.....................................................................................................................................11 List of Tables......................................................................................................................................13 List of Acronyms...............................................................................................................................15

Introduction ..................................................................... 17

1.1 Potential ................................................................................................ 18

1.2 Measuring Potential ............................................................................... 20

1.3Statement of the Problem ...................................................................... 21

1.4Overview..................................................................................................................................21

Background Study and Related Work .............................. 23

2.1 Literature Review .................................................................................. 23

2.2 Limitations of Existing Approaches ........................................................ 25

2.3 Sequence Prediction and Deep Learning .............................................. 26

Data and Pre-Processing ................................................. 29

3.1 Data Collection and Analysis ................................................................. 29

9

3.2 Data Pre-processing ............................................................................... 30

Model Architectures .................................................................. 32

Problem Definition ...................................................................................... 32

4.1 Multivariate Linear Regression .............................................................. 33

4.2 Decision Tree ......................................................................................... 33

4.3 Time Series ............................................................................................ 35

4.4 Feed Forward Neural Networks ............................................................. 35

4.5 Long Short Term Memory (LSTM) ........................................................ 37

4.6 Training Method…………………………………………………………….40

4.6.1 Linear Regression Training…………………………………………40 4.6.2 FFN Training…………………………………………………………41 4.6.2 LSTMs Training………………………………………………………42

Empirical Analysis and Discussion .................................. 45

5.1 Evaluation Procedure ............................................................................ 45

5.2 Performance Metrics .............................................................................. 46

5.3 Baseline ................................................................................................. 46

5.4 Dataset Preparation ............................................................................... 47

5.4.1 Modified Dataset…………………………...………………………..47

5.5 Results ................................................................................................... 49

5.6 Discussion .............................................................................................. 72 Conclusion and Future Work .......................................... 75

Appendix ........................................................................ 77

A1 Hyperparameter Configuration using Lasso Regression ........................ 77

A2 Hyperparameters Configuration using Feed Forward Neural Network (SGD) .......................................................................................................... 77

A3 Hyperparameters Configuration using Feed Forward Neural Network (Adam) ........................................................................................................ 78

A4 Hyperparameters Configuration using LSTM ......................................... 78

Reference ........................................................................ 82

10

11

List of Figures 1.1 Dependenceofplayers’potentialondifferentfactors……………………….19

4.1 Baselineneuralnetworkarchitecture………………………………………………..36 4.2 LongShortTermMemory(LSTM)architecture……………………………….…384.3 RNNbackpropagationwithtime………………………………………………………..425.1 PlayerspecificmodelusingLassoregression.Playerid194……………….575.2 PlayerspecificmodelusingLassoregressionPlayerid1964………………575.3 PlayerspecificmodelusingFeedForwardNeuralNetworkwith

meanvaluedataset.Playerid194…………………………………….................605.4 PlayerspecificmodelusingFeedForwardNeuralNetworkwith meanvaluedataset.Playerid1964……………………………………………………615.5 PlayerspecificmodelusingFeedForwardNeuralNetworkwith

meanvaluedataset.Playerid5206……………………………………………………615.6 PlayerspecificmodelusingFeedForwardNeuralNetworkwith

addingbackprediction.Playerid194……………………………………………….625.7 PlayerspecificmodelusingFeedForwardNeuralNetworkwith Addingbackprediction.Playerid1964………………………………………………635.8 PlayerspecificmodelusingFeedForwardNeuralNetworkwith

addingbackprediction.Playerid5206………………………………………………635.9 PlayerspecificmodelusingFeedForwardNeuralNetworkwith

timelagdataset.Playerid194…………………………………………………………….65

12

5.10 PlayerspecificmodelusingFeedForwardNeuralNetworktimelagdataset.Playerid1964…………………………………………………………65

5.11 PlayerspecificmodelusingLSTMwith

timeLagdataset.Playerid194………………………………………………………….69

5.12 PlayerspecificmodelusingLSTMwithtimeLagdataset.Playerid1964………………………………………………………..69

5.13 PlayerspecificmodelusingLSTMwith

timeLagdataset.Playerid5206………………………………………………………..705.14 PlayerspecificmodelusingLSTMwith

timeLagdataset.Playerid2885………………………………………………………..70

13

List of Tables 3.1Datafeatureswithacronymsandmeanings……………………………………….….301 ResultsusingRidge/LassoRegressionmodelswithRawDataset…………….49

2 ResultsusingDecisionTreeRegressorwithRawDataset………………………..50

3 ResultsusingRidge/LassoRegressionmodelwithMeanValueDataset…..51

4 ResultsusingDecisionTreeRegressorwithMeanValueDataset…………….52

5 ResultsusingDecisionTreeRegressorwithTimeLagDataset…………………53

6 ResultsusingLassoRegressionwithTimeLagDataset……………………………53

7 ResultsusingLassoRegressionwithFixedSizeWindow………………………….55

8 MultiplayerModelwithWalkForwardValidationusingLassoRegression………………………………………………………………………………………………55

9 FirstBaseline-PlayerspecificmodelwithWalkForwardValidation…………56

10 MultiplayerModelwithWalkForwardValidationusingFFN(SGD)………………………………………………………………………………………………………58

11 MultiplayerModelwithWalkForwardValidationusingFFN(Adam)………….............................................................................................59

12 SecondBaseline-PlayerspecificmodelwithWalkForwardValidation…..60

13 PlayerspecificmodelusingFFNaddingbackprediction………………………….62

14 PlayerspecificmodelusingFFNwithTimelagdataset……………………………64

15 MultiplayerModelusingLSTMwithWalkForwardValidation……………………………………………………………………………………………....67

16 PlayerspecificmodelusingLSTMwithWalkForwardValidation……………67

17 PlayerspecificmodelusingLSTMwithaddingbackprediction………………68

14

15

List of Acronyms FFN FeedForwardNeuralNetworkRNN RecurrentNeuralNetworkLSTM LongShortTermMemorySGD StochasticGradientDescentSRNN SimpleRecurrentNeuralNetworkWFV WalkForwardValidationReLU RectifiedLinearUnitCV Cross-ValidationLOOCV LeaveOneOutCrossValidationPid PlayerIDDS DefensiveSkillOS OffensiveSkillSSN SciSkillNowRTCE ResistanceOSTM OffensiveshareplayerthismatchMid MatchIDMD MatchDateMP MinutesPlayedFPid FieldPositionIDTid TeamIDCid CompetitionIDSS SciSkillIndexOSP OffensiveSharePlayerDNN DeepNeuralNetworkRFE RecursiveFeatureEliminationMPPE Minimumpotentialpredictionerror

16

17

CHAPTER 1

IntroductionEstimating or predicting the performance of players and results of matchesrelatedtodifferentsportsisnowapartofmostteamrelatedsports.Becauseofthis,data-drivenanalysis iswidelyused insportsanalysis.Soccer isoneof themostpopularsportsintheworld.Insports,mostoftheanalyticsisrelatedtotheplayers’ future performance, which helps the management in team selection.Some of the analysis are based on the physical characteristics (height,matureness, etc.) and other based on skills, For example, soccer skills include(dribbling,offensiveskills,defensiveskill,etc.).Predictingfutureperformanceofsoccerplayersisparamount,asitisdirectlyrelatedtothecoachesandmanagersto make right decisions, which leads to team formation and help in playerstradingforleagues.Fromalongtimeinpastsince1920,Sportsandpsychologyareinterlinkedinthefield of research. Mostly sports psychology is used to determine the playerperformance related to mental approach. It is also a part of studies that arerelated to talent identification [24]. From the psychology point of view,manystudies have been done in finding the relation between personality traits andsportsbehavior.Almost from last two decades, a study under research is Perceptual CognitiveSkillswherethefocusremainsonhowquicklyandefficientlyplayersrespondinthecomplexsituations;thesestudiesaremostlydoneintheballsports[24].Inteamsports,likesoccerorfootballanintelligentteamplayerisconsideredtobetheonewhoperformswell,butbesidesthat,heshouldpossessotherimportantfeaturessuchasgoodmemory,multi-tasking,etc.Intelligentplayersshouldhavean adaptable personalitywith the situations.Most of such abilities are part ofgame intelligence in sports [24]. Inneuropsychology, theyare calledExecutiveFunctions[24].ExecutivefunctionsaremorecommonlycalledCognitiveProcess.Thecognitiveprocessstimulatesthethoughtprocessandactionwheneverthereis a non-routine situation occurs. Examples include problem-solving,multitaskingandmanymore.Thedevelopmentofcognitiveprocesstakesplacethroughoutthechildhoodtilladolescence.According,tomostoftheresearchersit isuntiltheageof19[24],butothersdoargueandrelateittopracticingandhardwork.Cognitivefunctionshaveareallyimportantroleinsportslikesoccerwheretheplayerhastodecide

18

in every new moment under time constraints. They must have to take quickactions andplan decisions according to the situation [24]. Cognitive functions,intelligence abilities, psychological factors, physical capacities together comeunder one umbrella, and it is Potential. Potential is humans’ ability. Potentialdepends on many factors. To measure human potential, psychologists andexperts use person-oriented approach also presented by (Bergman andMagnusson, 1997; Bergman et al., 2003). In person-oriented approach, weconsidereverysingleindividualandmakeanumberofmeasurementsrelatedtodifferenttraitsandthensimilarindividualsgroupedforfurtheranalysis[42].Inthenextsections,wewilldiscussbrieflypotentialandhowitgetsmeasuredespeciallyinsoccerplayers.1.1 Potential The termpotential iswidely used in different fields fromphysics to the socialsciences.Inhumans,werelatepotentialtoone’sability,andasmentionedearlierit depends on many factors like intelligence factors, psychological factors,cognitive functions, etc. One of the sports psychologist and mental trainingexpert Dr. Patrick Cohn defined the athlete potential, as “It's your capacity toperformattheuppermostrangeofyourability.”Hefurthersaidthatpotential issomethingoritsabilitywhichindividualpossesbuttheyhaveyettoachieve.Oneofthepsychologistsexplainedthepotentialas[32]

Performance=Potential–interference

Whereinterferenceincludescoaching,exercise,practice,etc.potential isabilityandperformanceishoweffectivelysomeoneusesitsabilities.

Followingmodelshowsthedependenceofpotentialondifferentfactors

19

Fig1.1:Modelshowingthedependenceofplayerspotentialondifferentfactors[46].Wearenotgoingintodetailofeachfactor;wearejustdefiningeachofthem.Anthropometry: It means scientific study of human body measurements(height,weight,organs,etc.)andproportions.Game Skills: In this research, we are dealing with soccer players. So we aregoing to mention some of the soccer skills which include 1) shooting underattacking skills, 2) defensive skills, 3) resistance shown by each player, 4)offensiveskills,5)Eachplayeroffensiveshare,6)Permatchplayer’soffensive,etc.thesearealltheskillswhichvaryfromplayertoplayer.Psychological Factors: There aremany psychological factors, which involvespressure, confidence, motivation, anxiety, mental preparation and manymore[27]. In this research, we are not considering the effect of these factors onpotential.Cognitive skills: Theseare theskills,whichrelateshowaplayer responds tothesituation.Itinvolveshowaplayertacklesorperceivesacertainsituation.Itinvolvesdecision-makingandmultitaskingtasksaswell.Influenceoftheseskillsonpotentialisnotconsideredinthisstudy.Physical Capacity: It involves player running speed (sprint), muscle power,bodybuild,andagility–it’sthecapabilitytoresponsethestimulusthatinvolvesrapidwholebodymovement.Inthisresearch,wearenotconsideringtheeffectofthesefactorsonpotential.

20

1.2 Measuring Potential Differentmethodsandtestswouldcarryouttomeasurepotential.Someofthemare intelligence and ability tests. Both of these tests are domain specific. Fewother testsused formeasuringpotential includecompetencyanddevelopmentassessment[41].

MeasurementofPotentialinSoccerPlayers

Measuringplayerpotential isanextensiveprocess,which involvesabatteryoftests and observations. The widely used tests for measuring the potential ofsoccerplayersincludejudgmentofcoachesonfivebasicskills:passing,catching,kicking,running,anddefense.Other testswhichareused formeasuringsoccerplayer potential includes coaches rating, personal history index –considersplayersage,weight,height,etc.,physicalindex–considersplayersten-yardstart,fifty- yard dash, pull-ups and several other. Psychological index –it includesseveral factors like intelligence, assertiveness, and self-sufficiency [38]. All oftheseandmanyothertestsarecarriedoutinmeasuringplayerpotential.Expertssystematically,useall the test resultsandassigna composite score,which isaplayer potential. The scale of this score mostly starts from zero and goesupwardsandgetscutoffunder100or200 inmostof theFIFArankings.Someuse the scale of 0-100 and others use it from 0-200. Players’ potentialassessment isacontinuousprocess,whichmostlycarriedoutonayearlybasisandalsodependsonwhenitisrequired.Sofarwehaveexplainedthegeneralidearelatedtopotential,factorsaffectingitand how to measure it. All these tests andassessments are time-consumingandalso require many other resources like experts rating and opinion, etc. In thisresearch, we are interested in using machine-learning approaches to predictplayerfuturepotentialbasedontheirpreviousyear’sdata.Bydoingthatwecanmake effective decisions which are mostly related to players selection andplayerstrading.Most of the previous work or early researches remained focused on theperformance of players, match outcomes, player performances in comingmatchesusingmachinelearningapproachesbutwhenitcomestothepotentialprediction of players especially in soccer sport it does not get that muchattention. Moreover, significant work has done in the potential prediction ofplayers related to sports like Basketball. Potential is players’ ability to dosomethingwhile performance is howheperforms in that specificmatchusingthoseabilities.

21

1.3 Statement of the Problem

The research question is “How precisely we can predict the peak potential ofyoungsoccerplayerswithoptimalage,usingmachinelearningtechniques.”1.4 Overview

As an overview,we are interested in developing amodel that can predict thepeak potential of young soccer players’ with the help of machine learningmethods.To train and test themodel,weused those specific players from thedatasetwhosepotentialisalreadymeasuredfromtheageof15till26.Byusingthose players’ data, we have ground truth-values of potential at all ages of aspecificplayer.Thedatawehaveconsideredforthisresearchhas16featuresforeachplayer;allof the featureshaveanumericalvalue.Westartedoffwith thetraditionalmachinelearningapproacheswherewehaveemployedRidge/Lassomodels and Decision trees. Later, we employed feedforward neural networkswithmultiple configurations.We considered Feedforward neural network andLassoregressionasourbaselinemodels.Weappliedtwodifferentapproaches1)Multiplayer model approach and 2) Player specific approach. We used Mean-squared-error (MSE) as an evaluation metrics as it is the most widely usedevaluation metrics for regression problems. We considered this problem as asequencepredictionproblemor time-series forecastingproblemandproposedtheuseofdeeplearningmethodtosolvethisproblem.The precise solution to this problem could helpmany soccer clubs. Especially,small soccer clubs canmake huge profits by having information about playerfuturepotential,smallclubscanrecruitthoseyoungplayersandcanmakehugeprofits from their trading when they reach their peak. The trading of playersbetweenclubsiscalledSportsTradeandplayers’areaprimaryassetinit.Ourmaincontributioninvolvestheuseofdeeplearningtechnique–especiallyavariant of the recurrent neural network (LSTMs)with only an auto-regressedtarget as a feature for the peak potential prediction of soccer players withoptimalage.Rest of the thesis is organized as follows: Chapter2describes thebackgroundstudyandrelatedwork.Chapter3containsallaboutthedataandpre-processingapplied.Chapter4isrelatedtothemodelarchitecturesandgeneralexplanationrelatedtomodelworking.Chapter5depictstheresultsanddiscussionofresults,obtainedfromtheperformedexperiments.Chapter6takesthethesistowardsaconclusionandFutureWork.

22

23

CHAPTER 2 BackgroundStudyandRelatedWorkInthissection,webeginwiththebackgroundstudyandrelatedworkspecificallywheremachine-learningmethods are used for predicting theplayers potential(especially insoccerplayersandsomeof therelevantwork fromothersports)and at what age most of the players reach their peak? And what are theimportantvariablesconsideredforpotentialprediction?

Afterthat,wementionedsomeoftherelatedworkregardingdeeplearningandthegrowinginterestofitsimplicationindifferentfields.2.1 Literature Review Brefeldetal. (2016) [4]presentedworkonsoccerplayer’sperformance in theupcomingmatch.TheyusedfiveseasonsdataofGermanfootball leaguenamedBundesliga.Theyusedsupportvectorregressionandmultitaskridgeregressionfor the performance prediction. For the relevant feature identification, theyemployedrecursivefeatureeliminationtechnique.Theyproposedplayerspecificmodels and general models. They concluded that SVR models are performingbetter than ridge regression, however; SVR-player specificmodels suffer fromsparsedata ifwedon’tuseawindowsizeof8matches.Playerspecificmodelswith multitask ridge regression and multitask support vector regressionoutperformedthetraditionalorsingletaskSVRandridgeregression.Theyusedas many as 20 features as predictors for the prediction of next matchperformance.Malinaetal.(2007)[21]performedananalysisofsoccerskillspredictionon69young players mainly on passing, shooting and dribbling skills using multiplelinearregression.Theirstudyfocusedonquantifyingskillinsoccerbasedonage,experience,growth,andfunctionalcapacities.Theyperformedsixsoccerrelatedtests (ball control with the body, ball control with head, passing, shootingaccuracy, dribbling with pass and dribbling speed) to quantify the skill andassign a composite score to each player. Then they used the playercharacteristics like age, training, height, body mass, etc. to find the relativecontributionofeachcharacteristic incompositeskill.They foundthatmaturitystatus was connected with higher composite skill. It also mentioned thatcompositeskillscore(whichbelongsGameskills–partofpotential)wasnotwellexplainedbythesepredictors.

24

Post et al. (2009) [39] proposed a detailed analysis of tactical skills and theirimprovementamid191youthsoccerplayers fromagesof14 through18 fromDutch clubs. All players get self-assessed using Tactical Skills Inventory forSports Test where players from three different field positions (attackers orforward,midfielders, defenders) are informed to rate themselves on a 6 pointLikerscale;eachonehastocomparethemselveswiththeelitesoccerplayerinthesameagecategory.Themultilevelregressionmodelisusedtocarryoutthisstudy.The study isused to analyzehow tactical skills get changedwith age ineach field position. The result shows that tactical skills get improve with agemoreinforwardsorattackerascomparedtomidfieldersanddefenders.AresearchworkpresentedbyKevinWheeler[40],wherehetriedtopredicttheNBA player performance using SVM, Naïve Bayes, and Linear regression. Theresultsmentionedaftertheresearchshedlightupontheinabilitytopredicttheplayer’sperformanceaccuratelyusingthesemethodsandmentionedsomeofthebasicchallenges.Shardaetal.(2009)[2]propoundedresearchrelatedtoperformancepredictionof cricketer’s using neural networks. They trained themodel for each playerwherefortraininganyspecificplayertheyuseditshistoricaldata.Thedatatheyhaveconsideredis from1985until2007.Aftertrainingthemodel, theyuseditfor the near term player performance prediction. They proposed playerselections in their national teams forWorld cup 2007 based on the results oftheirmodel.ModelevaluatedusingtheactualperformanceoftheplayersintheWorldcup.Theresultsshowedthatneuralnetworksperformedoutstandinglyindecision support mechanism for team selection. In some of the experiments,neuralnetworksperformedwithanaccuracyof87%.SeifeDendir(2016)[3]presentedananalysiswhattheoptimalageorpeakageof soccer players is. The data they considered is from the top four Europeansoccer leagues and is come fromWhoScored.com. The analysis also takes intoconsiderationtheplayerspositioninthefield.Theanalysishadbeenmadeusingsimpleagedistributionandbivariateapproaches(simpleplottingofdata).Theresultsshowedthatforwardreachtheirpeaksorpeakpotentialattheageof25while defenders peak potential or optimal age is 27 and mid-fielders lie inbetween25-27ageband.Itisalsomentionedthatpeakagereliesonthepotentialorability.AdetailedworkpresentedbyVisscher,Cetal.(2009)[22]wheretheyusedtheTactical Skills Inventory for Sports for youthplayers to findoutwhichplayerswill be professionals and which will be armature in their future. The studyinvolved 105 top youth soccer players from an age group of 17-18. A logisticregressiontechniquewasemployedtosingleoutthetacticalskillthatleadsthe

25

player to professional performance in adulthood. The results found that if aplayerattheageof17-18performswellinpositioninganddecidingtacticalskill,thentheplayerhasan80%chancetobecomeaprofessionalinadulthood.YiLietal.(2014)[1]presentedananalysisoftheperformancepredictionofNBAplayers.Theyhaveusedtheentirehistorydata.TheyproposedanewmetricfortheevaluationofNBAplayerperformance,whichbasedonfactorialanalysisandcurrently existing player performance metrics. Training set created by usingclusteringalgorithmbasedonplayer’sage.Topredictperformanceforaspecificplayer,alltherestoftheplayersinthatclusterwouldbeconsideredasatrainingset.Forprediction, theyusedSVMasa learningalgorithm.Theypresented thenewevaluationmetricisefficient.2.2 Limitations of Existing Approaches Most of the previous work done remained focus on the match outcomeprediction; players next match performance prediction and player’s futureperformanceprediction. Inmostofthestudies, theyreliedmoreonthetacticalskillsandalsopsychologicalfactors.Wedidnotfindsucharelevantworkwheremachine learning algorithms were employed to predict the soccer playerspotential precisely or a player specific model which accurately predicts orforecast theplayerpeakpotentialand theoptimalage (atwhatage theyreachthatpotential).Aftertheinsighttotherelatedwork,ontheperformanceorpotentialprediction[2,4,21,40] using machine-learning algorithms, we can easily categorize thealgorithmsused.InFirstCategory,wecanconsideralgorithmslikelinearmodelssuchaslinearregression,multiplelinearregression,andridgeregression.Thesemodelshavethelimitationsastheyrelyonasetoffeatures,whichareextractedusingdifferentfeatureextractiontechniques.However,theyarenotalwaysfullycapable of explaining the target. The SecondCategory includes algorithms likeSupport Vector Regressors, we have seen from the related work that theirperformanceismuchbetterthanlinearmodelsinmostofthecasesbutwasnotsatisfactory,andtheyarequitecomputationallyexpensiveasmentioned in [8].Allthesemodelsneedampleamountofhistoricaldatatogettrainedon.Toprevailoverall the limitations in theexistingapproaches,weproposed thedeployment of deep neural networks and deep sequence learning methodswherefeatureextractionishandledautomatically,whichusuallyrequiredomainknowledgewhenusedwithtraditionalmethods.Inparticular,wearelookingforamodelthatusesleastdatatogettrainonandmadeanaccurateprediction.Weareinterestedinthepredictionofyoungplayerspeakpotentialwithoptimalage(atwhatagetheyreachthatpotential).

26

2.3 Sequence Prediction using Deep Learning Thissectiongivesanoverviewofdeeplearningimplicationindiversefields. Inmanyofthedomainsespeciallyrelatedtotimeseriesproblems,itoutperformedmany other learning algorithms. Domains like speech recognition, machinetranslation, economic time series and text prediction its performance isconsideredasstandard.Weshedlightonsomeofitsprojects.Kale.C.Detal.(2017)[28]presentedworkonmedicaldataanalysis.Thedataisrelated to the patients in ICU (intensive care unit). It was amultivariate timeseriesobservationsdataforeachpatient,whichisbasedonsensordataandlabtests.TheyusedLSTMs forpattern recognition in clinicalmeasurements.Theytrained the model to classify 128 diagnoses, using 13 clinical measurements.They compared its performance withmulti-layer perceptron. They found thatLSTMperformanceoutperformedthemultilayerperceptron.Nichols,P.Eetal.(2016)[29]usedLSTMforthepredictionofvolatilityinstockmarket related to top 500 companies in the U.S. It called S&P 500. Theycompared the LSTM performance with linear Lasso/Ridge and autoregressiveGARCH. Themeanabsoluteerror24.2%iswhat theygot fromLSTM,and thisoutperformed the others by at least 31%. They also mentioned that deeplearning models could show promising predicting results in stock behaviorproblems.Li, R et al. (2017) [30] proposed work related to household load forecastingusing pooling based recurrent neural network. They aim to learn theuncertainties like load aggregation and spectral analysis, which are avoidedconsistently with the use of traditional approaches. They also explained thelimitation of feedforward neural network, which is over-fitting that we couldhave when we try to make it deep by adding a stack of hidden layers. Datacollected from 920 smart-metered customers from Ireland. The proposedmethodoutperformedtheARIMAby19.5%,SVRby13.1%andsimpleRNNby6.5%.AlltheevaluationsaremadeusingRMSE(rootmeansquarederror).Sick,Bet al. (2016) [31]presentedawork related to thepower forecastingofsolar power plants. They made energy output forecasting for 21 solar powerplantswiththehelpofdifferentdeeplearningalgorithms,whichincludesDeepBelief Networks, AutoEncoder and LSTM. The results indicated that deeplearning algorithms forecasting performance ismuch superior to the ArtificialNeuralNetworks.

27

Onlyafewofthedomainsorfewoftheworkhavebeenmentionedintheabovedeep learningrelatedwork,andthere isa lotmore, theseare justaglimpseofhavinganideahowtheinterestisgrowinginthisarea.Deeplearningmodelsarecapable of extracting features from thedatadirectlymeans there is aminimalneedfordatapre-processingandinsomeofthecasesnofeatureengineeringisrequired.Optimaldeeparchitecturedesignvarieswiththeproblemonhandandismostlydonebytrialanderror.Nextsectionsshedlightonthedataandpreprocessing,workingofalgorithmsindetailfollowedbytheresultssection.

28

29

Chapter 3 DataandPre-processingInthischapter,wedescribedthedataindetail.Wealsomentionedthesourceofdataandwhatfeaturesdoesthedatahave?Besidesthat,wealsomentionedpre-processingstepsthatwehaveapplied.3.1 Data Collection and Analysis The data we have considered for this research is collected by SciSports, andSciSports has extracted the data from Scoresway.com. SciSports is one of thefastestgrowingsportsanalyticscompaniesinNetherland.Theymainlyusedataintelligencetounderstandsoccer.SciSportshastransformedthedataaccordingto their business needs. In data transformations, they made these specificfeatures and their values are getmeasuredby Scisports custombuilt in-housealgorithm.Thedatahasaround224Kdistinctplayersfrommorethan75soccerleagues.Eachplayerinthedatasethas16features.Followingtable01,mentionsfeaturenames,theiracronyms,meanings,andrange.Featurename

Acronym Meaning Rangemin-max

Player_id Pid Assigned unique id to players byScoresway.com

Player_Number

SciskillIndex SS SSindexcomputedafterthematch.Scisportsin-housecreatedindex

0-200

Sciskill_now SSN Scisportsin-housecreatedafeature 0-200

Potential Potential Each player potential computed after thematch

0-200

Resistance Resistance Oppositionfactor 0-100

Defensice_Skill DS Playersdefensiveskill 0-10

OffensiveSkill OS Playersoffensiveskill 0-10

Matchdate MD DateofmatchusedbyScoresway.com dd/mm/yy

Offensive_share_player

OSP The weighted avg. offensive share of theplayer

0-1

Match_id MID TheuniqueidofmatchusedbyScoresway Matchid

Position_id Fpid Theuniqueidofthepositionoftheplayer Int0-65

MinutesPlayed MP The number ofminutes played by player inthematch

0-90

30

Competition_id CID Theunique idof thecompetitionthatmatchplayedin

id

Offensice_share_this_match

OSTM Offensive share of player in the specificmatch

0-1

Teamid Tid Theuniqueidofteamplayerplayedfor id

Age Age Playerage Mm/yy

Table3.1:DataFeatureswithacronyms,meaningandrange3.2 Data Pre-processing: In data pre-processing step,we have discarded some of the players’ data thatplayedonlyforayearandplayersthathaveonlyplayedundertheageof15.Wediscarded those records because if we want to predict those players futurepotential, we remain unable to verify our results, as we do not have groundtruth-valuesforthem.Furthermore,wealsodiscardedsomeoftheplayersdatathatstartedplaying laterthantheageof26,becausemostof theplayersreachtheirpeakpotentialbetween theageof25-27 [3]andweare interested in thepredictionofplayerspeakpotential.

31

32

Chapter 4 ModelArchitecturesIn this chapter, we have explained the approaches, which we have used topredict the peak potential of soccer players. There are different approaches,which have been applied for potential prediction. It ranges from regression-baaed approaches to time series, towards neural networks and deep learningmethods. Following is theoverviewof algorithms,whichwehaveused for thepredictionofpeakpotentialof soccerplayerswithoptimalage.Beforemovingtowardsmodel’sarchitecture,wefirstexplaintheresearchproblem.Problem Definition Tosolvetheresearchproblem“Howpreciselywecanpredictthepeakpotentialof young soccer playerswith optimal age, usingmachine learning techniques,”weusedthosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and18andhaveplayedsubsequentlytilltheageof26.Wetrainedthemodelontheplayers’datafromtheageof15till19andpredictedsameplayerspotentialattheageof19till26.Thereasonbehindusingthosespecificplayersistohaveagroundtruth-valueforplayers’potentialatallages.So,wecanevaluateourmodelperformancebycomparingpredictedpotentialatspecificagewiththegroundtruth-valueatthatage.

We propose two categories of models 1) multiplayer models and 2) playerspecificmodels.1)Multiplayermodel–Inthemultiplayermodel,wehaveusedmultipleplayersdata for model training and testing. Using this approach, we used multipleplayersdatathatstartedplayingbetweentheageof15till18intrainingsetandsame players data from the age of 19 till 26 in the test set. Usingmultiplayermodel,weconsideredthatmodelparametersarenotplayerdependentandwehaveusedsamemodelparameterstopredictmultipleplayersfuturepotential.2) Player specific model– In the player specific model, we have used only aplayerdataformodeltrainingandtesting.Moreformally,weusedplayer’sdatafromtheageof15till19intrainingsetandplayer’sdatafromtheageof19till26 in the test set.We trained themodel using training set andmade playerspotential prediction on the test set. In the player specific model, we build aseparatemodelforeachplayertopredicttheirfuturepotential,consideringthatmodelparameterswillvaryfromplayertoplayer.

33

Inthisthesis,wefirstemployedtraditionalmachinelearningmodelslikelinearmodels and feedforward neural networks for potential prediction and willconsider their best performance as our baseline.We propose a deep learningmethod especially a variant of the recurrent neural network–LSTMs. Weconsidered this problem as a time-series forecasting problem–that considerssequentialdata–whereeverycurrentvaluehassomedependencyontherecentpast. LSTMs have the intrinsic property of learning long dependencies insequentialdata.4.1 Multivariate Linear Regression The linear regression model is most commonly used regression technique inforecasting. Its implementation and performance evaluation is comparativelyeasy[8].Regressionmodelsareusedwherewehavetoexaminetherelationshipbetween one or more independent variables let’s say x1, x2, x3, ..., xk and acontinuous dependent variable lets say Y [9]. In regression analysis,we try topredict the value of the dependent variable as accurate as possible usingindependentvariablevalues[9].Inourcase,thepotentialisdependentvariableand features such as defensive skills, offensive skills are all independentvariablesorpredictors.Themodelcanbewrittenintheform.

Y=β0+β1X1+β2X2+….+βkXk(4.1)

Where Y is the target (potential),xi is the feature input and βi are called theregressioncoefficients(weightsorparameters). Tocalculatethecost,wehaveusedsquaredlossobjectivefunction,whichcanbeexpressedas

𝐶𝑜𝑠𝑡 = !!

(Ŷ − Y)!!!!!

(4.2)WhereŶisthepredictedvalue,andYistheactualvalue.ThedifferencebetweenthemisthecostorLoss.Themodeltrainingisexplainedindetailinsection4.6.1.4.2 Decision Tree Adecisiontreehasatreestructureorgraphicalrepresentation,whichiseasytounderstand as it resembles human reasoning and ismostly used to representtree models. Their illustration is one of the advantages they have over otherlearningalgorithms[14].

Intreearchitecturethetopnodeiscalledtherootnode–rootnodecontainsallthedataandnoedgesenteringit,andthereisonlyonerootnodeinadirectedor

34

rootedtree.Astreeshavegraphicalrepresentation,soGraphG=(V,E)whereVstands for nodes or vertices and E is for edges. A node with no child ordescendentiscalledleafnodeorterminal.Restofthenodes(exceptrootnode)iscalledinternalnodesorchildnodes[16].

Decision tree models are non-parametric. They made decision rules forprediction by using greedy top-down recursively splitting of data [12]. InRegressiontrees,predictorsorexplanatoryvariables(continuousorcategorical)areused topredict continuousornumericdependentvariable.Thesplittingofpredictors intonodes is basedon conditions like if-then-else.All thepredictorvariablesareconsideredasthecandidatesforchildnodepartitionandtheone,whichminimizesthevariance,wouldbeconsideredasthebestcandidateandgetselected for further splitting [13]. Each split partitioned the data intocontradictory (cannot be true both at a time) child nodes– which can givemaximum homogeneity at that split. Homogeneity depends on the type ofdependentvariable.Nodehomogeneityisexplainedbyimpurity.Ifthenodesarecompletely homogeneous, then impurity is zero. As homogeneity decreasesimpurity increases. In classification problems, we most commonly useinformation index (entropy), Gini index and Twoing index as measures ofimpurity(splittingstandard).

For regression tresswemostlyuseVarianceandSumofabsolutedeviationonmedian[15].Variancegivesaveragedistancesfrommean[17].

Var(x)=1/nΣin=1(xi-xi)2 (4.3)Whenweusevariancewithdecisiontrees,westilldogreedyminimizationbutuse variance for partition into child nodes [17]. In the case where we havenumericexplanatoryvariablesorpredictorsforsplittingweusegreaterthanorless thanvalues, for some selectedvalue.The splitting is applied to eachnodefurther. Splitting continues till large grown tree and then gets pruned to theappropriateorpropersize[15].Modelinghastwogoals1)description–simplyexplain the structure of data 2) prediction– how accurate it predicts theunobserveddata.Decisiontreesareexceptionallygoodinboth.Inourcase,eachoftheleafnodescontainsapartitionofaplayerspotential.Rootnodeandchildnodeshave the features (DS, SS,OS, etc.),whichare selectedusingbinary treesplitwiththevariancereductionmethod.Theplayerpotentialpredictionvalueisthe one inwhich leaf’s partition it falls. To find out towhich partition playerspotentialbelongstoortopredict theplayerspotentialweneedtotraversethetreeusingasequenceofquestionsateachfeaturelevelbasedontheanswersofthat questions we assign it a partition at leaf node which is the predictedpotentialaswell.

35

4.3 Time Series Timeseriesapproachesareoneoftheoldestmethodsusedforforecasting.Timeseries approaches include a range of models from simple ones like Auto-Regressive (AR) model, Moving Average (MA), Exponential Smoothing to thecomplex ones like ARMA (Auto-regressive moving average), ARIMA (auto-regressiveintegratedmovingaverage),etc.[8].Timeseriesmethodsareconstructedonthesuppositionthatthereareinternalstructuresinthedatasuchastrend,seasonalandauto-correlation,whichcouldbe,exploitedbyusingthesemodels.ARMAmodelsaremostlyusedforstationaryprocesswhereasARIMA is themodification of ARMA to cater the problems ofnon-stationaryprocesses[19].

Auto-regressive(AR)modelsusepastvaluesofthedependentvariable;weuselinearcombinationsofthosepastvaluestopredictthedependentvariable.Thetermauto-regressionmeansself-regressionofdependentvariable[18].Themathematicalformorfunctionoftheauto-regressivemodelcanbewrittenas

yt=c+θ1Yt-1+θ2Yt-2+……+θpYt-p+et(4.4)Where c is a constant and et is white noise [18]. Equation 4.4 resemblesmultivariate regression, but in multivariate regression, we use a linearcombinationofmultiplepredictorstopredictthedependentvariable,andinARmodelweuselaggedvaluesofYtaspredictors,Ytisatargetitself.ARmodelscanbewritten as AR(p)model, where p is an integer represents a number of lagvaluestouseinamodelandisresponsiblefordifferenttimeseriespatterns[18].WehaveusedAR(1)model,wehaveauto-regressed thepotential (target),andcreatedonly1-timelagfeatureofpotentialwhichislastyearpotentialandusedthatasapredictorwithlinearregressiontopredictplayersfuturepotential.4.4 Feed Forward Neural Networks Most commonly used neural networks areMultilayer perceptron also cited asMLPs.Itssimplestarchitectureconsistsofatleastthreelayers.1)Inputlayer2)hiddenlayer3)outputlayer.Eachlayerhasanumberofneuronsinit,andallarefullyinterconnectedinafeed-forwardway.TheyalsoreferredasFeedForwarddue to the architecture theyposses for information flow.Neurons in the inputlayerareconnectedtotheneuronsinthefirsthiddenlayer(ifwehavemultiplehiddenlayersthanneuronsinthefirsthiddenlayergetconnectedwiththenexthiddenlayerneurons)andhiddenlayerneuronsareconnectedwiththeoutputlayerneurons(inasinglehiddenlayerarchitecturemodelotherwiselasthiddenlayerneuronsareconnectedtotheoutputneuron)[2].

36

The classic architectureof theneural network is composedof differentnestedfunctionsthat’swhywecallthemnetworks.[10].Theinput layerneuronstakeinputthepredictorsorindependentvariables.Theoutputlayerhasthetargetordependentvariable(inourcaseitwillbetheplayerpotential).Atthefirsthiddenlayer, we apply the non-linear transformation with the help of activationfunction f. Themost commonlyused activations functions are (Rectified linearUnit (relu), Sigmoid and Tanh). The hidden layer output is in the form off(Σ(x;Θ)+β)whereΘistherandomlyassignedsmallweighttoalltheincominglinks of each neuron (neurons in hidden layer and output layer),x is the inputvariables,andβisthebias.

Inmost of our experiments, we have used Relu or Rectified linear Unit as anactivation function as it is recommended for modern feed-forward neuralnetworks[10].Thisfunctionisquiteclosetolinearfunctionandisconsideredasapiecewiselinearfunctionwithtwolinearpieces.

Thistypeofmodelsdoesnothavefeedbackconnectionslikeinrecurrentneuralnetworks [10]. If there arenohidden layers in thenetwork, itwill be called alinearnetwork,andituseslinearregressionforlearning[2].

Attheinputlayer,wefedthenetworkwithallthefeatures(SSN,DS,OS,OSTM,resistance,OSP,SS,Fpid),smallweightsget initializedbetweeninput layerandhidden layer,hidden layerandoutput layer.At thehidden layerwehaveusedRelu as an activation function, and at output layer, we have used a linearfunction. The predicted potential we get from the output layer, after eachforwardpass,wecomputethedifferencebetweenpredictedpotentialandactualpotentialwiththehelpofsquaredlossobjectivefunction.

Fig.4.1:BaselineFeed forwardNeuralNetworkGeneralArchitecture. Inall experimentsattheoutputwehaveusedlinearactivation,andforhiddenlayer,wehaveusedrelu,tanhandsigmoid activation functions with different experiments. It’s a general model with onehidden layer and two neurons whereas we have used multiple layers with a differentnumberofneurons.

37

4.5 Long Short-Term Memory (LSTM) The backbone of recurrent neural networks is their cyclic connections, whichmakesthemrobustandwidelyusedtoolespeciallytomodelsequenceproblems.Tasks related to sequence labeling and prediction like language modeling orhandwriting recognition, Simple RNN achieved unprecedented success. As adisparity with DNNs (Deep Neural Networks), SRNNs have a dynamicallychangingwindowconcerningthe inputsequenceascomparedtoFeed-forwardnetworkswherewehavestaticorfixedsizewindow.[34].SRNNs(SimpleRecurrentNeuralNetwork)haveacyclicarchitecturethatcatersforconsistentlyprovidingtheprevioustime-stepnetworkactivationsasinputtothecurrenttime-step.SRNNskeeptrackofactivationsforeachtime-stepwhichinreturnmakesthemquitedeepnetworks,astheirdepthincreasesitbecomeshardtotrainsuchmodelsusingbackpropagationthroughtime(BPTT)becauseof the exploding and vanishing gradient problems–which means that whengradientisbackpropagatedthroughtimeiteitherexplodes(growexponentially)ordecays(vanishes)[35].

Explodinggradientproblemscanbetackledbygradientclipping.Thevanishinggradientismoretaxing,andtherehasbeenalotofresearchdonetoresolvetheissue of vanishing gradient problem.One of the noticeableworks includes theuse of second-order optimization algorithms by (Martens, 2010; Martens &Sutskever,2011)but it increases thecomputationalcostsignificantly [36].OneoftheothertechniquestoresolvethevanishinggradientproblemisbytheuseofRNNsweights regularizationproposedby (Pascanu et al., 2012).The standardusedfordealingwithvanishinggradientproblemisLSTM(Longandshort-termmemory)developedandaddressedbyHochreiter&Schmidhuber(1997)[35].

The LSTM (Long short-termmemory) is a type of RNN and is easy to train ascompared toSimpleRNN [35].LSTMshavean intrinsicproperty to captureorstore long-term temporal dependences or information [37]. The optimizationproblems,which SimpleRNNs get, do not affect LSTMs. In the hidden layer ofLSTMarchitecture,itcontainsunitscalledmemoryblocks.

Memoryblockconsistsofmanycomponentsnamelymemorycellsandgates.Thememorycell isusedtostorethenetworktemporalstatewiththehelpofgates.These gates (input, output, forget)with the help of activation function controlinformation flow.Memory cells have self-connections like a carousel. Below isthedetaileddiagramofLSTMMemoryblockwithallitscomponents.

38

Fig.4.2:LSTMunitwithalldetails[43].Thefiguredepictstheinputstoeachgate,activationfunctionsappliedateachstepandmemorycarousal.ThediagramaboveisanLSTMMemoryblock.IthasthreeinputsCt-1,Ht-1,andXt.Ct-1isthememoryfromthepreviousLSTMblockandiscalledthememorycell.Ht-1istheoutputofthepreviousLSTMblock,andXtistheinputatcurrenttime-step.Theoldmemorypassesthroughthebitwisemultiplicationoperationwhereitgetsmultipliedbyavaluefromforgetgate,whichdecideswhichinformationtopassthroughandwhichonetoforget.Ifthevalueisclosetozero,thenitmeansforgetmostofthememory,andifitiscloseto1,itmeanstopassthroughalltheoldmemory.ForgetGateThe forget gate is a one layer neural network having inputs Xt–input for thecurrentblock,previousblockmemoryCt-1,previousblockoutputHt-1andbias.Thisonelayerneuralnetworkhasasigmoidactivationfunction,anditsoutputistheforgetvalue.Theweightscalculatedandoutputtedthroughforgetgateareasfollows. ft=σ(Wf.[ht-1,xt]+bf) (4.5)

39

InputGateThe second gate is input gate. Input gate has two simple one layer neuralnetworks.Bothhavesimilarinputs(bias,theoutputofthepreviouscell,currentinput) like forget gate and both outputs a value one with sigmoid activationfunctionand theotherwith tanhactivation function.Theoutput fromaneuralnetworkwithsigmoidisit,andtheoutputoftheotherneuralnetworkwithtanhis čt. The value from these neural networks gets multiplied and added to oldmemory cell Ct-1 and forms the new memory Ct. The weights calculated andoutputtedthroughinputgateareasfollowing[43]. it=σ(Wi.[ht-1,xt]+bi) (4.6) čt=tanh(Wc.[ht-1,xt]+bc) (4.7) Ct=ft*Ct-1+it*čt) (4.8) Afterthecreationofnewmemory,LSTMblockgeneratesitsoutputwiththehelpofoutputgate.OutputGateOutputgatetakesinputbias,previousblockoutputHt-1,currentblocksinputXtandalsothenewmemory.Outputgatedecideshowmuchnewmemoryshouldoutput to the next LSTM block or unit with the help of sigmoid activationfunction. The output of the current unit is Ht, which is formed by a bitwisemultiplication operation between output gate and current unit squashed newmemory Ct. Following are the weights calculated and outputted through theoutputgate[43].

ot=σ(Wo.[ht-1,xt]+bo) (4.9) ht=ot*tanh(Ct) (4.10)WehaveusedLSTMswithM -1 sequencepredictionmodel.Aspredictors,wehave used only a time lag feature of potential,which is last year potential. ByusingMto1model,wegivemanytimesequentialinputs(oneatatime)tothemodel. For example, to predict the player potential at the age of 21, wesequentially give inputs from the age of 17 till 21 and thenmake a potentialpredictionattheageof21.

40

4.6 Training MethodsSo far, we have explained the general architecture of models. Now we willexplainhowthedifferentmodelswillgettrain.4.6.1 MultivariateLinearRegressionTrainingUsingtheequation4.1asstatedbelowagain. Y=β0+β1X1+β2X2+….+βkXkThe inputof themodel is a vectorof samplesand features (DS,OS, resistance,etc.)andthetargetispotential.Themodeltrainingstartsbyrandomlyassigningsmall weights to the features and model get trained by iteratively optimizingtheseweightsbyusinglossfunctionandoptimizationalgorithm.Wehaveusedthesquaredlossasanobjectivefunctionasmentionedinequation4.2andlaterexplainedinsection5.4.Thecostcomputedthroughtheentiretrainingsetandgetminimizedusingoptimizationalgorithms.Moreformally,theweights(𝛽)ofthemodelget learnedoroptimizedusingiterativeoptimizationalgorithmslikegradientdecentmethodoritsvariantslikeSGD.Thegeneralformulaofgradientdescentisasfollows. 𝛽j:=𝛽j–∝ !

!!!Cost(𝛽!,𝛽!… . .𝛽!)j=0,1,…..n (4.11)

The optimization algorithmwehave used in experiments is SGD. SGDupdatestheweight(𝛽)similarly(4.11)butusingsmallbatchesascomparedtogradientdescent. For weight optimization using SGD we have used different learningrates (∝) (2.0, 0.5, 3.0). We have also used the regularization parameters toreduce theover-fittingproblem [10].WehaveusedL1orLasso regularizationand L2 or Ridge regularization. In L1 or Lasso regularization the unimportantfeaturesareconsideredneartozeroweightandisalsousedforfeatureselection.With regularization we modify our cost function by adding a penalty term(ℷ ∥ 𝛽 ∥!).Itcanbewrittenasfollows 𝐿𝑎𝑠𝑠𝑜 = 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒!∈ℝ!

!!∥ 𝑦 − 𝑋𝛽 ∥!!+ ℷ ∥ 𝛽 ∥!(4.12)

WhereybelongstoresponseandX∈ ℝn*pbeamatrixofpredictorsandℷ≥ 0isa tuning parameter. Similarly, In Ridge or L2 regularization, we penalize thecoefficients with large values more [11]. We can write L2 regularization asfollows

41

𝑅𝑖𝑑𝑔𝑒 = 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒!∈ℝ!!!∥ 𝑦 − 𝑋𝛽 ∥!!+ ℷ ∥ 𝛽 ∥!!(4.13)

Where y belongs to response or actual value and X∈ ℝn*p be a matrix ofpredictors andℷ≥ 0is a tuning parameter. Whenℷ = 0, it will be a linearregression,ℷ = ∞,wegetridgecoefficient(orpenalty(ℷ ∥ 𝛽 ∥!!))=0andforℷinbetween,isthecaseoffittingalinearmodelyonXandshrinkingthecoefficients.Theevaluationmetricusedformodelperformanceismeansquarederrorandisexplainedinsection5.2.

4.6.2FFNTrainingInFFN,theinputofthemodelisavectorofsamplesandfeatures(DS,OS,OSM,resistance,etc.).Themodelrandomlyassignssmallweightstotheinputfeatures.The learning of these weights and biases is called model training. In FeedForward Neural Network, the gradient is computed using Backpropagationalgorithm.Theerrorgetscomputedaftereachforwardpassthroughthenetworkthroughoutthemodeltraining.Thecomputederroristhenpassedbackwardstoupdatethemodelweights.Precisely,partialderivativesof the loss functionarecomputedw.r.t each layersweight using chain rulemechanism.Oncewe havethe gradient or (partial derivative w.r.t weight), we have used optimizationalgorithmSGDtoupdatethemodelweightsusingequation4.16.ForExample,ifxiistheithinputvaluetoneuron,wiistheweightgiventoxiandmisthenumberof inputs to the neuron, then the output of neuron using non-linear activationfunctioncanbeexpressedas: 𝑛 = 𝑤!!

! 𝑥! (4.14) z=𝜎(𝑛)Partialderivativeoflossfunctionw.r.tweightwiusingchainrulecanbewrittenas:

!"!!!

= !"!"

!"!"

!"!"#

(4.15)

Where L represents the loss, to compute the loss we have used squared lossobjective function as explained in section5.4.Due to largeparameter space indeep learning models, there are many local minima beside global minimummeans that the error surface is highly non-convex. To overcome the issue ofgettingstuckatthelocalminima,wehaveusediterativeoptimizationalgorithmStochasticGradientDescent (SGD) tominimize theLossThe evaluationmetricused formodelperformance ismeansquarederrorand isexplained insection5.2.

42

4.6.3 LSTMsTrainingForwardPassThe inputof themodel isa3d tensor (samples, timestamps,datadimensions).Themodel gets trained using the equations 4.5 – 4.10 in a feed-forward wayfrom time-step t-1 toTusingonly1 time-lag featureof potentialwhich is lastyear’s potential. We must have to mention here time-steps because LSTMsconsiderdependencybetweentheinputsovertime[47].BackwardPassTounderstand thebackwardpasswehave to first consider the followingRNNdiagram:

Fig.4.3.ItsanSRNN.TheleftistherecursivedescriptionandtherightistheunfoldformofSRNNmodelwithtime[47].We are just mentioning the summary of backward pass, how error getsbackpropagatedandhowweightsgetupdatedonlyatTimet.InsimpleRNNthelosscomputedattheoutputlayerisbackpropagatedthroughtime.Asshowninthediagram(redcycle),wefirsttakepartialderivativeoflossattheoutputlayerw.r.ttheweightWhz.Afterthat,wetakethepartialderivativeoflossw.r.tweightWhh(feedbackloop)atthehiddenlayer,andthenwetakethepartialderivativeoflossw.r.ttotheweightWxh.TounderstandthebackpropagationwithRNN,wehaveconsideredextremelysimpleRNNarchitecturewith1neuroninthehiddenlayerandexplainedthebackpropagationforasingletime-stept.Inthiscase,themodelparameters𝜃are{Wxh,Whh,Whz}.Thesearetheweightsexceptthebiasesthat’sgetupdatedusingSGDatsingletime-stept[47].SGDupdatestheweightusingthefollowingequation. 𝜃 = 𝜃 − 𝜂𝜕𝜃 (4.16)𝜂isthelearningrate.WhenweuseLSTMsascomparedtoSRNN,theextramodelparameterscomeinareatthehiddenlayer.TheweightWhhatahiddenlayerinSRNNisfurthergetssplit into the input gate weight, output gate weight and forget gate weight inLSTMs.UsingLSTMs,inthebackwardpass,wecomputepartialderivativeofloss

43

(loss -computed using Squared loss objective function at the model’s output)w.r.t all the weights associated with these gates beside the output and inputlayerweights.InLSTMsthemodelparameters𝜃are{Whz,Wxo,Wxi,Wxf,Who,Whi,Whf,Whc} except biases. For each time-step t during backpropagation,we takepartialderivativeoflossw.r.talltheseweights(𝜃).Aftertakingthederivativeweupdate the weights using SGD optimizer as shown in equation 4.16 [47]. Theevaluation metric used for model performance is mean squared error and isexplainedinsection5.2.

Theweightscanbeexplainedas:Whz=weighthiddentooutputlayerattimetWxo=WeightoutputgategetfromcurrentinputXt,attimetWxi=WeightinputgategetfromcurrentinputXt,attimetWxf=WeightforgetgategetfromcurrentinputXt,attimetWho=Weightoutputgategetfromhiddenlayer(feedbackloop),attimetWhi=Weightinputgategetfromhiddenlayer(feedbackloop),attimetWhf=Weightforgetgategetfromhiddenlayer(feedbackloop),attimetWhc=Weightmemorycellgetfromhiddenlayer(feedbackloop),attimet

Wehavediscussedthecoreideaofbackpropagationthroughtime(BPTT),justtodevelop the basic understanding of parameters learning in RNNs–LSTMs. Westronglyrecommendthereadertoconsult[47]forfurtherdetailsoncomputingpartial derivatives and for an in-depth understanding of backpropagationthroughTime(BPTT)withLSTMs.

44

45

Chapter 5

EmpiricalAnalysisandDiscussionIn this chapter, wewill discuss the performancemeasure, evaluation strategyandtheresultsfromalltheexperimentscarriedoutusingdifferentdatasets.5.1 Evaluation Procedure Therearemanydifferentmethods,whichcanbeemployedtoevaluatemachine-learningmodels;italsodependsontheproblemathand.Somecommonlyusedapproaches include K-fold Cross Validation, Leave One Out Cross Validation(LOOCV, suitable for small and midsize datasets as it is computationallyexpensive with large datasets [45]) and splitting data randomly into the test,trainandvalidationset. Inourcase,weused stratifiedK-foldcross-validationwith k= 10, for some of the experiments where we considered a multiplayermodel with traditional machine learning methods. In K-fold cross-validation,training data split into K smaller blocks or sets, where k-1 folds are used fortrainingand1leftoutblockorsetisusedforvalidation.Toensurethatmodelisgeneralized we used holdout test set. We also used Walk forward validationmethod. It is a model evaluation method used mostly with time series orsequential dataset [44]. In training set, it keeps consecutive previous years ormonthsdata(forexample inourcaseplayersdata fromtheageof15till18 intrainingset),and inthetestset, itcontainsthe followingyearsormonthsdata(players data from the age of 19-25). It helpswith sequential data becausebythis we train the model sequentially with time. For example, by using thistechnique,itisnotpossiblethatwetrainthemodelondatafromyears17or18andvalidateitonthedatafromtheage16,whichwouldbepossiblewithk-foldcross-validation.Itworksasfollows:

1) Selectthesequentialortemporaldataforsomespecificperiodintrainingset.Let'ssayinourtrainingset;weusedtheplayer’sdatafromtheageof15till19.

2) Fitthemodelusingatrainingset3) Makeapredictiononthetestset,whichcontainsdatafromthefollowing

year(Forexampleinourcaseplayer’sdatafromtheageof19.Foreverynextiterationitincreasesby1year)

46

4) Save prediction results, which can be used later with performancemetrics.

5) Increase thewindowsizeof the training set (For example, consider theplayer’sdatafromtheageof15till20intrainingset).

6) Repeatsteps(2)to(5)untilage26inourcase.

5.2 Performance Metrics We have used mean-squared-error metric (MSE) to assess the modelperformance on the test set, as it is one of the most widely used metrics forregression problems. It is considered as a standard metric for some of theresearchstudies.Inmodelcomparisonamongdifferentmodels,ametricthatcandiscriminatemoreamongmodelresultsisdesirable,andMSEisbetterinitasitgiveshighvaluestoadverseconditions[33].ForExample,studiesrelatedtoairquality,meteorology,etc.[33].Itgetscalculatedforthedatasetas

𝑀𝑆𝐸 =1𝑛 𝜀!!

!

!!!

5.1

Where𝜀!!is the square of the difference betweenmodel prediction and actualvalue. In our case it is the difference between the predicted value of potentialandground-truthvalueofthepotential.5.3 Baseline Wehaveproposedtwobaselinemodels.First–PlayerspecificmodelwithLassoregressionusingwalkforwardvalidationandSecond–PlayerspecificmodelwithFeedforwardneuralnetworkusingwalkforwardvalidation.Theperformanceofour baseline models prompted us to use recurrent neural networks becausebaseline models consider each input independently, these models do notconsider the sequential nature of data, whereas our problem is of time seriesforecasting (predicting young player potential with optimal age) where everynextvaluehas somedependencyon the recentdata.Tosolve thisproblemwemust have to cater the sequential nature of data, and that’s what recurrentneural networks LSTMs do. For the implementation of traditional machinelearningmethodswehaveusedScikitlearn [48]and for the implementationoffeedforward neural network and deep learning models–LSTMs we have usedTensorFlow because for implementing complex architectures, TensorFlowprovidesefficiencyandflexibility[49].

47

5.4 Dataset Preparation These are the steps, which remain same for all the experiments. For all theexperimentswe have used players’ data from the age of 15 till 26 because ofvalidgroundtruth-valuesforplayersatallages.Foralltheexperiments,wehaveusedz-scorenormalizationtorescalethedatafeaturesordimensions;itisusedwhenthefeaturevaluesareonthedifferentscale.Itcanbeexpressedas: 𝓏 = 𝓍!𝓊

! (5.2)

Where𝓊 is the mean and𝜎 is the Standard deviation. Same-scaled datadimensions helps gradient-based optimization algorithms to give equalimportance toeach featureandalsohelps inconvergence. Ifwedonotrescalethedatadimensions, thensomeof themodelparameterscanupdate faster.Tostandardize or normalize the dataset, we have used z-score normalization. Itrescalesthedatapointstotheorigin(orthefeaturesarecenteredaroundzerowith standard deviation of 1).We have used different splitting techniques fortraining and test set, which get explained later. For all experiments, we haveused Squared Loss as objective (loss or cost) function as it is themostwidelyused loss function with regression problems. Loss function evaluates thedifferencebetweenmodel’soutputandexpectedvalue.Itcanbewrittenas:

𝐿 = !!

(Ŷ − Y)!!!!! (5.3)

WhereŶis thepredictedvalue (modeloutput), andY is the expectedvalue.Mresemblesthenumberofsamples.5.4.1ModifiedDatasetFormostoftheexperiments,wehaveusedtheMeanvaluedataset.Wemodifiedtherawdatasetbecauseinouropinionithassomebasicproblems.ForExampleOneoftheproblemswehavefoundintherawdatasetisanirregularpatterninplayers’ data– means players are not playing monthly, quarterly or yearly(players’ data do not have specific intervals). We named this new dataset asMeanValueDataset.

1) Wehave transformedtheplayers’age toyearsonly,which is inmonthsand years in the raw dataset. By converting the age to years only, wemanaged to make regular intervals in players’ data. For Example: if aplayerhastheageof23.1,23.2,23.3wedowncastitto23(asitislessthan23.5otherwise24).Oneotherreason,wechangedtheplayers’agefrom

48

monthsandyears toyearsonlybecauseweare interested inpredictingplayers’futurepotentialperageyear.Forexamplepotentialatage20orpotentialatage21.

2) We selected only important features to predict potential from the rawdataset by using recursive feature elimination technique. RFE is astepwise backward feature elimination method [7]. It’s an elementaryapproach to figure out important features and get rid of irrelevantfeatures[4].Itranksfeaturesusingsomesortoftheirimportance.InRFE,we initiallyconsiderall the features topredictpotentialandrecursivelyremoves one by one based on their ranking and also on the significantimprovementinmodelperformance.Tostatisticallyverifythesignificantimprovement inmodelperformanceweused thepaired t-testwithap-value<0.05.Theimportantfeatureswehaveconsideredare(DS,OS,SSN,Resistance,OSTM,FPid,SS,OSP,Age,Pid).

3) We calculated the mean of potential and 7other features (DS, OS, SSN,Resistance,OSTM,SS,OSP)withrespecttospecificageforallplayers.Forexample,ifaplayerhasplayedthreematchesattheageof23.1,23.2,23.3andhavethevaluesforallthefeatures,Firstweroundedoffageto23(asitislessthan23.5otherwise24)afterthatwetookthemeanofeacheightfeatures. By taking the mean, we have a single value of features at aspecificageforeachplayer.Followingarethesamplesfrombothdatasetsthatwouldhelpthereaderinunderstandingpoint3.

Pid Age Ss Ssn Potential Resistance DS OS OSTM Fpid OSP

4 23.074 5.9 3.0 5.9 5.7 1.26 1.46 0.0 1 0.49

4..

23.115..

6.2..

3.1..

6.2..

5.8..

1.32..

1.64..

0.0..

1..

0.48..

Player’sdatavaluesintheRawdatasetPid Age Ss Ssn Potential Resistance DS OS OSTM Fpid OSP

4 23.0 6.83 3.43 11.09 5.91 0.99 1.65 0.0 1 0.46

Player’sdatavaluesinMeanValuedataset

49

5.5 Results 5.5.1ResultsfromtheexperimentsusingrawdatasetResult1:Table1depictstheresultsofplayers’futurepotentialpredictionerrorusingRidge/Lassoregressionwiththerawdataset.

For thisexperiment,wehaveusedarawdatasetwithall features(Pid,DS,OS,SSN,Resistance,OSTM,Mid,MD,MP,FPid,Tid,Cid,SS,Age,OSP,Potential).Weselectedthosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and25andplayedsubsequentlytilltheageof26.Afternormalizingthedata,we split it into test and training set. In training set,weused theplayers’datafromtheagegroupof15-25,andinthetestset,weusedthesameplayers’(as in training set) data from the age of 26. We have considered all the 15features as predictors or explanatory variables (Pid, DS, OS, SSN, Resistance,OSTM,Mid,MD,MP, FPid, Tid, Cid, SS, Age, OSP) and the target or dependentvariable is potential. We used this experiment as one-step forecasting. Thisexperiment is an attempt to test amultiplayermodel. For this experiment,wehave used multivariate Ridge and Lasso regression. The model training isexplained in detail in section 4.6.1.We tested themodel using 10- fold cross-validation.Weconfigured themodelusingdifferent learningrates (2.0,0.5,3.0).Theoptimallearningratewegotis3.0usingmultivariateLassoregression.Weget a mean potential prediction error of 37.07, using multivariate lassoregression with learning rate of 3.0. Multivariate Lasso regression modelsoutperformed theRidge, regressionmodel. In table1,we have onlymentionedthe best performance from the models obtained. Rest of the modelconfigurationsthatwehavetriedandresultscanbefoundinAppendixA1.

Table1:MultiplayerModel.UsingRidge/LassoRegression.Sameplayersintrainingandtestset.RawdatasetusedN=representsthenumberofplayersintrainingsetn=representsthenumberofplayersintestset

Dataset

Trainingset Testset Regularization

Learningrate

TestSet(MPPE).Playersage26

All features(rawdataset)

Same playersage=15-25

Sameplayers26

Lasso

3.0

N=36460pn=36460p37.07

Ridge 3.0

69.815

50

Result2:Table2depictstheresultsofplayers’futurepotentialpredictionerrorusingdecisiontreeregressorwiththerawdataset.

For thisexperiment,wehaveusedarawdatasetwithall features(Pid,DS,OS,SSN,resistance,OSTM,Mid,MD,MP,FPid,Tid,Cid,SS,Age,OSP,Potential).Weselectedthosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and25andplayedsubsequentlytilltheageof26.Afternormalizingthedata,we split it into test and training set. In training set,weused theplayers’datafromtheagegroupof15-25,andinthetestset,weusedthesameplayers’(asintrainingset)datafromtheageof26.Weconsideredallthe15featuresaspredictorsorexplanatoryvariables(Pid,DS,OS,SSN,RTCE,OSTM,Mid,MD,MP,FPid, Tid, Cid, SS, Age, OSP) and the target or dependent variable is potential.Thisexperimentisanattempttotestamultiplayermodel.Forthisexperiment,we have used decision tree regressor. We have used Variance reduction tomeasure the quality of a split. We tested the model using 10- fold cross-validation. We configured the model with max-depth=2, splitting criteria orfeatureselectionused isvariancereductionandminimumsamples requiredatleafissetto1.Theevaluationmetricusedforthemodelperformanceismean-squared-error.Themeanpotentialpredictionerrorusingdecisiontreeis76.22withtherawdataset.

Table2:MultiplayerModel.ResultsusingDecisiontreeregressorwithrawdatasetThe reason behind the use of multiple players’ data in training set with aminimumstartingageof15till25anddoastepforecastingoroneyearaheadpotentialpredictionisjusttohaveenoughdataformodeltrainingandevaluatemodel performance.We have seen from the results thatmodels performancesarenotsatisfactoryandhaveenormousmeanpotentialpredictionerrors.Inouropinion, Better performance of Lasso regression than Ridge regression, alsoindicatesthatallthefeaturesarenotimportant.5.5.2ResultsusingmeanvaluedatasetResult3:Table3depictstheresultsofplayers’futurepotentialpredictionerrorusingLassoregressionwithmeanvaluedataset.

Forthisexperiment,wehaveusedmeanvaluedatasetwith11features(DS,OS,SSN, Resistance, OSTM, FPid, SS, OSP, Potential, Age, Pid). We selected those

Dataset

Trainingset Testset Testset(meanp.predictionerror)1-yr26yr(MSE)

All features-rawdataset

Sameplayers15-25

Sameplayers26

N=36460n=3646076.228

51

specificplayersfromthedatasetthatstartedplayingbetweentheageof15and25andplayedsubsequentlytilltheageof26.Afternormalizingthedata,wesplititintotestandtrainingset.Intrainingset,weusedtheplayers’datafromtheagegroupof15-25,andinthetestset,weusedthesameplayers’(asintrainingset)data from the age of 26. We have considered eight features as predictors orexplanatory variables (DS, OS, SSN, Resistance, OSTM, FPid, SS, OSP) and thetargetordependentvariable ispotential.Weused thisexperimentasone-stepforecasting.Thisexperiment isanattempttotestamultiplayermodel.Forthisexperiment,wehaveusedmultivariateLassoregression.Themodel training isexplained insection4.6.1.We tested themodelusing10- foldcross-validation.We configured the model using different learning rates (0.5, 0.4, 3.0). Theoptimallearningratewegotis0.5usingmultivariateLassoregression.Wegetameanpotentialpredictionerrorof8.50,usingoptimal learningrateof0.5withmultivariateLassoregression.

Table 3:MultiplayerModel using Lasso Regression. Same players in training and test setusingmeanvaluedataset.

Result4:Table4depictstheresultsofplayers’futurepotentialpredictionerrorusingDecisionTreeregressorwithmeanvaluedataset.

Forthisexperiment,wehaveusedameanvaluedatasetwith11features(DS,OS,SSN, Resistance, OSTM, FPid, SS, OSP, Age, Pid, Potential). We selected thosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and25andplayedsubsequentlytilltheageof26.Afternormalizingthedata,wesplititintotestandtrainingset.Intrainingset,weusedtheplayers’datafromtheageof15till25,andinthetestset,weusedthesameplayers’(asintrainingset)datafromtheageof26.Weconsideredeight featuresaspredictorsor independentvariables (DS, OS, SSN, Resistance, OSTM, FPid, SS, OSP) and the target ordependent variable is potential. This experiment is an attempt to test amultiplayermodel. For this experiment,wehaveusedDecision tree regressor.WehaveusedVariancereductiontomeasurethequalityofasplit.Wetestedthemodel using 10- fold cross-validation. Themodel configuration is same asweusedforexperiment2.Theevaluationmetricusedforthemodelperformanceismean-squared-error. The mean potential prediction error using decision treewithmeanvaluedatasetis55.509.

Dataset

Trainingset Testset Learningrate

Testset(meanp.predictionerror)

Mean valuesbest features(8)


Sameplayers

N=40565n=40565

0.5-Lasso 8.502

52

Table4:MultiplayerModel.ResultsusingDecisionTreeRegressorwithmeanvaluedatasetResult 3 and Result 4 showed improvement in the model performance whenemployed with mean value dataset. We have used the same longer trainingwindow of 10 years as in experiment 1 and 2, is just to highlight the modelperformanceusingdifferentdatasets.Moreover, we created a new dataset where we used time lagged features ofpotentialaspredictorsandignoredallothersandnamedthedatasetasTimelagdataset.Newfeaturescreatedbyauto-regressingthepotential.Wecreatedthisdatasetbecauseafterconvertingtheplayers’agefromyearsandmonthstoyearsonly,wemanagedtocreateasequenceinplayers’datawithrespecttoage.Withhavingsequentialdata,wecananalyzethepatternsevolvingovertime,whichisalso calledTimeSeries analysis.Wehave created twoauto-regressed features,whicharelastyearpotentialandthedifferencebetweenacurrentyearandlastyearpotential.Result5:Table5depictstheresultsofplayers’futurepotentialpredictionerrorusingDecisionTreeregressorwithtimeseriesdataset.

Forthisexperiment,wehaveusedatimelagdatasetwithfivefeatures(Age,Pid,Potential, last year potential, difference b/w this year potential and last yearpotential). We selected those specific players from the dataset that startedplayingbetweentheageof15and25andplayedsubsequentlytilltheageof26.Afternormalizing thedata,wesplit it into testandtrainingset. In trainingset,weusedtheplayers’datafromtheageof15till25,andinthetestset,weusedthesameplayers’(asintrainingset)datafromtheageof26.Weused2time-lagfeatures as predictors or independent variables (last year potential, differenceb/w this year potential and last year potential) and the target or dependentvariableispotential.Thisexperimentisanattempttotestamultiplayermodel.Forthisexperiment,wehaveusedDecisionTreeregressor.Wetestedthemodelusing10-foldcross-validation.Themodelconfigurationissameasweusedforexperiment2.Theevaluationmetricused for themodelperformance ismean-squared-error.Themeanpotentialpredictionerrorusingdecisiontreewithtimelagdatasetis135.30.

Dataset

Trainingset Testset Testset(meanp.predictionerror)1-yr26 yrMSE

8-Best features(MeanValues)

Sameplayers15-25

Sameplayers15-25

N=40565n=4056555.509

53

Table5:MultiplayerModel.ResultsusingDecisionTreeregressorwithtimelagdatasetResult6:Table6depictstheresultsofplayers’futurepotentialpredictionerrorusingLassoregressionwithtimelagdataset.

Forthisexperiment,wehaveusedtimelagdatasetwithfivefeatures(Age,Pid,Potential, last year potential, difference b/w this year potential and last yearpotential). We selected those specific players from the dataset that startedplayingbetweentheageof15and25andplayedsubsequentlytilltheageof26.Afternormalizing thedata,wesplit it into testandtrainingset. In trainingset,weusedtheplayers’datafromtheageof15till25,andinthetestset,weusedthe sameplayers’ (as in training set)data from theageof26.Wehaveused2time-lag features as predictors or independent variables (last year potential,difference b/w this year potential and last year potential) and the target ordependent variable is potential. This experiment is an attempt to test amultiplayer model with time lag dataset. For this experiment, we have usedmultivariateLasso regression.Themodel training is explained in section4.6.1.We tested themodelusing10- fold cross-validation.Theoptimal learning ratewegotis2.5usingmultivariateLassoregression.Theevaluationmetricusedformodelperformance ismean-squared-error.Wegetameanpotentialpredictionerrorof55.450,usinglearningrateof2.5withmultivariateLassoregression.

Table6:MultiplayerModelLassoregressionwithtimelagdatasetFrom the experiments 5 and 6, we have seen that models performance getreducedwhen employedwith time lag dataset.We think that the featureswemade by auto-regressing the potential are not capable enough to explain thedependent variable,which is potential. Other than that, in time series analysiseverynext input isdependenton therecentpast,but thesemodelsconsideredeachinputindependently.

Inthenextexperiments,Inordertodevelopamodelthatcanbeusedtopredicttheyoungplayerspeakpotential,weusedtheplayers’databetweentheageof

Dataset

Trainingset Testset Testset(meanP.predictionerror)1-yr26yrMSE

Mean values(2-timelagfeatures)

Sameplayersage=15-25

Sameplayers26yrs

N=36348n=36348135.308

Dataset Trainingset Testset Learningrate Testset(MPPE)26yrs.N=36348n=36348

Mean values (2lagvalues)

Sameplayers Sameplayers26yrs

2.5-Lasso 55.450

54

15and18intrainingsetandsameplayers’(asintrainingset)datafromtheageof19till26inthetestset.Wehavepreferredtheageinterval15-19totrainthemodelbecausewewanttodevelopamodelthatcanpredictthepeakpotentialofyoungsoccerplayers.Andweare interestedinpredictingtheplayerspotentialtilltheageof26becausemostoftheplayerspeakpotentialreachbytheageof26 [3]. For next experiments, we also used Player specific model besidemultiplayermodel.Fornextexperiments,weusedmeanvaluedatasetandtimelagdataset.5.5.3ResultsusingmultivariateLassoRegression:Result7:Table7depictstheresultsofplayers’futurepotentialpredictionerrorusingLassoregressionwithmeanvaluedataset.Forthisexperiment,wehaveusedmeanvaluedatasetwith11features(DS,OS,SSN, Resistance, OSTM, FPid, SS, OSP, Potential, Age, Pid). We selected thosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and18andplayedsubsequentlytilltheageof26.Afternormalizingthedata,wesplititintotestandtrainingset.Intrainingset,weusedtheplayers’datafromtheageof15till19,andinthetestset,weusedthesameplayers(asintrainingset)datafrom the age of 19 till 22.We have considered eight features as predictors orindependent variables (DS, OS, SSN, Resistance, OSTM, FPid, SS, OSP) and thetarget or dependent variable is potential. This is an attemptmade to comeupwithamultiplayermodel toanalyzehow farwecanpredictwithaccuracy theplayerspotentialusingfixedsizewindow.Inthisexperiment,wehavenotusedwalkforwardvalidation.Wetrainedthemodelusingtrainingsetandpredictedthe test set age intervals one by one. In this experiment, we used the fixedwindow size for training (15 till 19). We firstly predicted the players’ futurepotentialfortheage19,andthenkeepingthetrainingsetsamewepredictedtheplayers futurepotential at the age of 20 and so on till the age of 22. For thisexperiment, we have used multivariate Lasso regression. The loss (cost orobjective) function we have used is Squared loss. We have used the SGD(stochasticgradientdescent)forweightoptimization.Wetestedthemodelusing10-foldcross-validationwithalearningrateof0.5.Theevaluationmetricusedforthemodelperformanceismean-squared-error.Theresultshowsthataswemove forward, our potential prediction error starts growing. It shows thatwecannot use fixed size training window to predict multistep ahead playerspotentialprecisely.

55

Model Training Testsetplayersage(meanP.predictionerror)LR-Lasso(0.5)

Allplayers(15-18)

19 20 21520.910

589.106

607.926

Table7:MultiplayermodelusingmeanvaluedatasetwithfixedsizedwindowandLassoregressionResult8:Table8showstheresultsofplayers’futurepotentialpredictionerrorfrom the experiment where we used Lasso regression with Walk Forwardvalidationmethodasmentionedinsection5.1.Forthisexperiment,wehaveusedmeanvaluedatasetwith11features(DS,OS,SSN, Resistance, OSTM, FPid, SS, OSP, Potential, Age, Pid). We selected thosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and18 and played subsequently till the age of 26. After normalizing the data,wesplit it intotestandtrainingset.Intrainingset,weusedtheplayers’datafromtheageof15till19,andinthetestset,weusedthesameplayers(asintrainingset) data from the age of 19 till 26. We have considered eight features aspredictors or independent variables (DS, OS, SSN, Resistance, OSTM, FPid, SS,OSP)andthetargetordependentvariableispotential.Thisisanattemptmadeto come upwith amultiplayermodel usingwalk forward validation. All stepsrelated to WFV explained in detail in section 5.1. We have used multivariateLassoregression.Themodeltrainingisexplainedinsection4.6.1.Wetrainedthemodelwitha learningrateof0.5.Fromtheresults,wecanseethat ifweuseawindow lengthof 8 years thanwe canbest predict thepotential of all playersusing multivariate lasso regression as it gives minimum mean potentialpredictionerrorwiththemultiplayermodel.Model Training

Testsetage(meanP.predictionerror)

Lr–(0.5)Lasso

Allplayers(15-18)

19 20 21 22 23 24 25520.910

269.293

137.870

60.608

18.027

6.784

7.341

Table8:MultiplayermodelusingLassoRegressionandwalkforwardvalidationwithmeanvaluedatasetResult9:Table9shows theresultsof specificplayerpotentialpredictionerrorwithagewhereweusedLassoregressionwithWalkForwardvalidationmethod.Itisourbaselinemodel1.Forthisexperiment,wehaveusedmeanvaluedatasetwith11features(DS,OS,SSN,Resistance,OSTM,FPid,SS,OSP,Potential,Age,Pid).Wehaveusedonlya

56

specificplayer.Afternormalizingthedata,wesplitthedataintotestandtrainingset.Intrainingset,weusedtheplayer’sdatafromtheageof15till19,andinthetestset,weusedtheplayer’sdatafromtheageof19till26.Wehaveconsideredeight features as predictors or independent variables (DS,OS, SSN,Resistance,OSTM,FPid,SS,OSP)andthetargetordependentvariableispotential.Thisisanattempt to testaplayerspecificmodelusingwalk forwardvalidation.Allstepsrelated to WFV explained in detail in section 5.1. We have used multivariateLasso regression. The model training is explained in section 4.6.1. From theresults,wecanseethattopredicteachplayer’sfuturepotentialwithminimumpotential prediction error we need to use different window size for modeltraining.We found thatmodel parameters vary player to player or are playerspecific.Ascomparedtotheresultsofthemultiplayermodelfromexperiment8we found that multiplayer model results are not reliable. The results of thisexperimentprovedthatmodelparametersareplayerspecificandalsotopredicteach player future potential with MPPE we need a different window size formodeltraining.Wealsofoundthatweneedadifferentlearningratetotrainthemodel for each player. For example, for player id 1964 we get the minimumpotentialpredictionerrorbyusingawindowsizeof4-yearsformodeltrainingwhereasforplayerid5206,wecannotpredictfuturepotentialprecesilyevenbyusingawindowsizeof9-yearsformodeltraining.Itisourbaselinemodel1. Model Trainin

gspecificplayer(15-18)

Testingplayers(potentialpredictionerror)

1 0.5L

P-194

19 20 21 22 23 24 2513763.920

15363.023

10297.565

141.276

1.872 4859.571

1428.410

2 7.0L

P-1964

2523.025

102.406

6.706

53.625

137.234

61.330

25.720

3 7.0L P-5206 1759.055 699.083 1033.902

2175.555

2926.033

3066.091

1877.914

Table9:BaselineModel1.PlayerSpecificmodelwithwalkforwardValidationFollowing graphs showing specific player potentialw.r.t age. The graph showsthepredictedvalueofpotentialwithanorangecurveandactualvalueofplayerspotentialinbluecurve.

57

Fig. 5.1: player specific model potential w.r.t age. The blue line shows the actualpotential,andorange-lineshowsthepredictedpotential.Thegraphisforplayer-id194

Fig. 5.2: player specific model. potential w.r.t age. The blue line shows the actualpotential, and orange-line shows the predicted potential. The graph is for player-id1964

5.5.4ResultsFromFeedForwardNeuralNetwork(FFN):UsingFeedForwardNeuralNetworkwe tried every experiment for8 times and takentheaverageofpotentialpredictionerrorataspecificage.Weusednumpy.seed(7)aswellforreproducibility.Result10&11:Table10and11showstheresultsofplayers’potentialpredictionerror fromtheexperimentwhereweusedFeedForwardNeuralNetworkwith

58

Walk Forward validation method. The only difference between table 10 andtable11isthattheformeroneiswithSGDoptimizerandthelatteroneiswithAdam optimizer. The results showed that model with SGD optimizeroutperformedtheotherone.Forthisexperiment,wehaveusedmeanvaluedatasetwith11features(DS,OS,SSN, Resistance, OSTM, FPid, SS, OSP, Potential, Age, Pid). We selected thosespecificplayersfromthedatasetthatstartedplayingbetweentheageof15and18andplayedsubsequentlytilltheageof26.Afternormalizingthedata,wesplititintotestandtrainingset.Intrainingset,weusedtheplayers’datafromtheageof15till19,andinthetestset,weusedthesameplayers(asintrainingset)datafrom the age of 19 till 26. We have used eight features as predictors orindependent variables (DS, OS, SSN, Resistance, OSTM, FPid, SS, OSP) and thetargetordependentvariableispotential.Thisisanattempttotestamultiplayermodelusingwalkforwardvalidation.AllstepsrelatedtoWFVexplainedindetailin section 5.1. In this experiment,we have used Feedforward neural network.The model training is explained in section 4.6.2. We trained the model withdifferent configurations (using different activation functions, optimizers,learningratesandwithmultiplelayers).Wehaveshownthebestresultsintable10 and 11, all the other results with different configurations can be found inAppendix A2 and A3. We trained the model with different learning rates(0.0001,0.001,0.003,0.0003), butweget thebestmodelperformancewithSGD(lr=0.001)withsixhidden layersandreluasanactivation functionusingnineyearswindowsizeformodeltraining.Fromtheresults,wecanseethatifweusea window size of 9 years formodel training thanwe can predict the players’future potential with minimum prediction error using Feedforward neuralnetworkwiththemultiplayermodel.Model Optimiz

ersLayers

Activation

Testplayerages(meanP.predictionerror)

FFN(Trainingallplayers(15-18)-

SGD0.0001

19 20 21 22 23 24 256 relu 603.7

2240.09 111.8

550.52

15.55

9.49 13.02

2 relu 513.186

202.205

128.264

44.461

17.55

10.65

13.87

4 relu 221.20

129.85 131.64

138.72

18.00

21.86

9.75

Sgd0.001

6 relu 504.90

266.154

107.58

49.57

15.05

8.73

8.42

Table10MultiplayerModelusingSgdoptimizerandwalkforwardvalidation

59

Model Optimizers

Layers

Activation

Testplayerages(meanP.predictionerror)


adam0.0001

19 20 21 22 23 24 256 Relu 3823.

51500.99

307.27

197.25

112.31

81.37

63.15

0.001 6 relu 522.32

181.59

94.46 50.05 28.48 21.05

24.70

Table11:MultiplayerModelusingAdamoptimizerwithwalkforwardvalidationResult12:Table 12 results are from the experimentwherewe tested a playerspecific model with Feed Forward Neural Network using Walk ForwardValidation.Forthisexperiment,wehaveusedmeanvaluedatasetwith11features(DS,OS,SSN,Resistance,OSTM,FPid,SS,OSP,Potential,Age,Pid).Afternormalizingthedata,we split it into test and training set. In training set,weused theplayer’sdatafromtheageof15till19,andinthetestset,weusedtheplayer’sdatafromthe age of 19 till 26. We have considered eight features as predictors orindependent variables (DS, OS, SSN, Resistance, OSTM, FPid, SS, OSP) and thetarget or dependent variable is potential. We have used Feedforward neuralnetworktotrainthemodel.Themodeltrainingisexplainedinsection4.6.2.Wetrainedthemodelwithdifferentconfigurations(usingactivationfunctions(relu,sigmoid, tanh) with different combinations of them, optimizers (sgd, adam),learning rates (0.001,0.5, 1,3,7) and with multiple layers (1 till 8)). We haveshown best results in table 12. In this experiment, we have used Adam as anoptimizertominimizetheloss.Wehaveusedfivehiddenlayers.Foreachhiddenlayer,wealternativelyusedsigmoidandtanhactivationfunction.Intable12,wehave mentioned the number of neurons in each hidden layer. The player’sselectionisentirelyrandom.Wecanseefromtheresultsthattopredictfuturepotentialprecisely, forplayerid1964,weneed8yearsofwindowsizetotrainthemodel.Forplayerid194and5206weneedawindowsizeof4yearsand3yearsrespectively, totrainthemodel. Inouropinion, thedifference inwindowsize isbecauseof thepotentialbehaviorofplayers.The formerone (id -1964)hasnotrendinpotentialwithage–meansitspotentialisnotsurgingorplungingwith age, but for the latter ones with player id – 194 and 5206, potential isincreasingsteadilywithage.Fromtheresults,itisclearthatweneedasmallerwindow size to predict player’s future potential with minimum potentialpredictionerrorthathavesometrendinpotentialwithage.Weconsideredthismodelasoursecondbaselinemodel.Bythecomparisonofresults10,11and12it again proved that we cannot use the multiplayer model to predict playersfuturepotentialbecausemodelparametersareplayer specific.Topredicteachplayer future potentialwithMPPEweneed a differentwindow size formodeltraining.

60

Player

Optimizers

Layers

Activation

Testplayerages(potentialpredictionerror)

1964

Adam5.0L.Rate

19 20 21 22 23 24 255 Sig,Tanh,

Sig1,1,1,1,1

133.32

509.06

1.08 70.37 178.27

11.29 0.39

194 Adam7.0l.r

5 Sig,Tanh,Sig

5488.20

201.40

1.35 17.65 69.50 834.57

1174.81

5206

Adam7.0lr

5 Sig,Tanh,Sig

7.72 1.06 175.78

1761.58

2659.62

2562.64

4084.35

Table12:SecondBaselinemodel.PlayerspecificmodelusingFeedForwardNeuralNetworkwithwalkforwardvalidation.MeanValuedatasetused.Following graphs showing the specific player potential w.r.t age usingFeedforward neural network with mean value dataset and walk forwardvalidation.Thegraphsshowedthepredictedvalueofpotentialintheorangeandactualvalueofplayerpotentialinblue.

Fig. 5.3: player specific model. potential w.r.t age. The blue line shows the actualpotential and orange- line shows the predicted potential. The graph is for player-id194

61

Fig. 5.4: player specificmodel. potential w.r.t age. The blue line shows the actualpotential,andorange-lineshowsthepredictedpotential.Thegraphisforplayer-id1964

Fig. 5.5: player specific model. potential w.r.t age. The blue line shows the actualpotential, and orange-line shows the predicted potential. The graph is for player-id5206

Result13:Table13presentstheresultsoftheexperimentwherewerecursivelyadded back the player’s predicted potential in the training set before nextpredictionmade.Inthisexperiment,allthemodelconfigurationsremainedsameasinpreviousexperiment(no.12).Thisexperimentisanattempttotestplayer’sspecific model. In this experiment, we recursively added back the player’s

62

predictedpotential at the specific age to the training set beforewemadenextpotentialprediction.Aswe can see from the results,potentialpredictionerrorincreaseswithage.Itisbecausepotentialpredictionerrorforeveryspecificagegetsaccumulatedwiththenext.Player

Optimizers

Layers

Activation


1964

Adam5.0

19 20 21 22 23 24 255 Sig,Tanh

,Sig1,1,1,1,1

171.78

53.81 98.96 8.49 104.08

2.13 2.89

194 Adam7.0

5 Sig,Tanh,Sig

3410.13

2625.52

3598.94

3643.02

4349.47

4823.84

4642.35

5206

Adam7.0

5 Sig,Tanh,Sig

0.834 2.303 104.985

1951.126

4680.30

5853.16

5782.34

Table13:PlayerspecificmodelusingFeedForwardNeuralNetworkwithaddingbackprediction.Following graphs showing the specific player potential w.r.t age usingFeedforwardneuralnetworkwithmeanvaluedatasetandwherewerecursivelyadded back the predicted potential to the training set. The graph shows thepredictedvalueofpotentialinorangeandactualvalue(G.T)ofplayerpotentialinblue.

Fig. 5.6: player specificmodel. potential w.r.t age. The blue line shows the actualpotential,andorange-lineshowsthepredictedpotential.Thegraphisforplayer-id194usingFFNwithaddingbackprediction

63

Fig. 5.7: player specificmodel. potential w.r.t age. The blue line shows the actualpotential,andorange-lineshowsthepredictedpotential.Thegraphisforplayer-id1964usingFFNwithaddingbackprediction

Fig. 5.8: player specific model. potential w.r.t age. The blue line shows the actualpotential, and orange-line shows the predicted potential. The graph is for player-id5206usingFFNwithaddingbackprediction

Result 14:Table14depicts theresults of player potential prediction errorwithage from the experiment where we have used Feed Forward Neural Networkwithtimelagdataset.Thedatasetweused for this experimenthas four features (Pid,Age, potential,last year potential (time lag feature)). For this experiment,we created only 1-

64

time lag feature (last year potential) by auto-regressing the target, which ispotential. After normalizing the data, we split it into test and training set. Intrainingset,weusedonlytheplayer’sdatafromtheageof15till19,andinthetestset,weusedthesameplayer'sdatafromtheageof19till26.Wehaveused1time lag feature as a predictor or independent variable, which is last yearpotential and dropped all other features. The target or dependent variable ispotential.We have used a player specificmodelwithwalk forward validation.For this experiment, the model architecture of feed-forward neural networkconsists of 2 hidden layers with sigmoid and tanh as an activation functionrespectively.Wealsomentionedthenumberofneuronsineachhiddenlayerintable14.Themodeltrainingisexplainedinsection4.6.2.Wefoundthatresultsobtainedusing time lagdataset (table14) for someplayershave lesspotentialpredictionerroratspecificageascomparedtothe(table12)resultswhereweused multiple independent variables as predictors. It means that time lagfeaturesofpotentialaremorecapableofpredictingpotentialthanotherfeatures(DS,OS,SSN,Resistance,OSTM,FPid,SS,OSP)usingFFN.Wealsonoticedthatforeachplayer,themodelneedsadifferentlearningrateforoptimalresults.Italsoshowsthatmodelparametersareplayerdependent.Player

Optimizers

Layers

Activation


1964

Adam3.0

19 20 21 22 23 24 252 Sig4N,

Tanh1N714.28

24.29

0.07 9.65 162.08

21.23 16.9

194 Adam3.0

2 Sig4N,Tanh1N

104.24

112.79

313.83

167.91

150.92

316.98

166.92

5206 Adam7.0

2 Sig4N,Tanh1N

2.67 197.78

313.15

1147.40

840.99

1621.56

1092.77

Table14:APlayerspecificmodelwithFeedForwardNeuralNetworkusingtimelagdatasetandwalkforwardvalidation.Following graphs showing the specific player potential w.r.t age using FeedForwardNeuralNetworkwithtimelagdatasetandwalkforwardvalidation.Thegraphshows thepredictedvalueofpotential in theorangeandactualvalueofplayerpotentialinblue.

65

Fig. 5.9: player specific model. potential w.r.t age. The blue line shows the actualpotential, and orange-line shows the predicted potential. The graph is for player-id194.FFNwithtimelagdataset

Fig.5.10:playerspecificmodelwithFFN.potentialw.r.tage.Theblue lineshowstheactual potential, and orange-line shows the predicted potential. The graph is forplayer-id1964.

66

5.5.5ResultsusingLSTMWithLSTMsweonlyusedtime lagdataset.AsLSTMsarecapableofextractingfeatures automatically,which are effective as compared to themanual featureextractionprocess,whichneedsdomainknowledgeandiscumbersome.LSTMswork on sequential data. So, to use a dataset with LSTMs, we used twoapproaches.

1) Window size approach–wherewemade sequences usingwindow size,windowsizetellsthathowmuchpreviousyeardatashouldbeincludedinonesequence.Itmadesmallsequences(equaltothesizeofthewindow)ofdataandusedthatsequencesfortraining.Therearesomearchitecturaldrawbackswith thisapproachdue to consideringmultipleplayers foramultiplayermodel.Whenusedwithmultipleplayersitremainsunabletodistinguish where the new player data is going to start. So the lastsequenceofplayerxgetsmergedwiththefirstsequenceofplayerx+1.

2) The second approach we used is with time lag dataset– we created asingle time lag feature(lastyearpotential) fromthegivenpotential.Weusedthisapproachforallofourexperiments.

Result 15:Likewise, previous experiments with other learning algorithms, wetrained amultiplayermodel using LSTM (Long short-termmemory). Table 15resultsshowtheplayerspotentialpredictionerrorusingLSTMs.The dataset we considered for this experiment has four features (Pid, Age,potential,lastyearpotential(timelagfeature)).Forthisexperiment,wecreatedonly1time-lagfeature(lastyearpotential)byauto-regressingthetarget,whichispotential.Afternormalizing thedata,wesplit it into testand trainingset. Intrainingset,weusedplayers’datafromtheageof15till19,andinthetestset,weconsideredthesameplayers(as intrainingset)data fromtheageof19till26. We have used 1 time-lag feature as a predictor or independent variable,which is last year potential and dropped all other features. The target ordependent variable is potential. We made this attempt to test a multiplayermodelusingLSTMs.Thenetworkarchitecturewehaveusedforthisexperimenthasonehiddenlayerwith32LSTMscellsorunits.Thetrainingofthemodel isexplainedindetailinsection4.6.3.Theloss(costorobjective)functionwehaveused is Squared loss. In this experiment,we have used Adam as an optimizerwithalearningrateof0.001tominimizethelossasitoutperformedSGD.Fromtheresults,wecanseethatLSTMsremainedunabletomakeasequencefromthedataandthat’swhytheirmeanpotentialpredictionerrorishigh.

67

Model Optimizers Layers Testplayerages(meanp.predictionerror)LSTM(Trainingallplayer

adam0.001

19 20 21 22 23 24 251 163.88 111.56 91.53 67.55 54.55 55.52 57.33

Table15:MultiplayerModel.walkforwardtechniqueusingLSTMsResult16:Table16depictstheresultsofplayerfuturepotentialpredictionerrorfromtheexperimentwherewetrainedaplayerspecificmodelandusedthewalkforwardvalidationwithLSTMs.Thedatasetweused for this experimenthas four features (Pid,Age, potential,last year potential (time lag feature)). For this experiment, we created only 1time-lag feature (last year potential) by auto-regressing the target, which ispotential. After normalizing the data, we split it into test and training set. Intrainingset,weusedonlysingleplayer’sdatafromtheageof15till19,andinthetestset,weusedsameplayer’sdatafromtheageof19till26.Wehaveused1time-lag feature as a predictor or independent variable, which is last yearpotential and the target or dependent variable is potential. We have used aplayerspecificmodelwithwalkforwardvalidation.Forthisexperiment,weusedrecurrent neural network variant LSTMs. The network architecture we haveusedforthisexperimenthasonehiddenlayerwith10LSTMscellsorunits.Wealso tried itwith1,32 and64units.Weusedmultiple optimizers (SGD, adam)with different learning rates (0.001, 0.01) to train themodel as shown in thetable. Themodeltrainingisexplainedinsection4.6.3.TheresultsshowedthatLSTMs learns the sequencewhen applied to the player specificmodel.We getzero prediction error with this model, so we can use this model to find outplayerspeakpotentialwithoptimalage.Player Optimizers Layers LSTM

Block orunits


1964

SGD(0.01)

19 20 21 22 23 24 251

1,10,32,64

0.0 0.0 0.0 0.0 0.0 0.0 0.0Adam(0.001)

194 SGD(0.01) 1 1,10,32,64 0.0 0.0 0.0 0.0 0.0 0.0 0.0Adam(0.001)

Table16:LSTMPlayerspecificmodelwithwalkforwardvalidation.Result17:Table17resultsarefromtheexperimentwherewerecursivelyaddedthe player’s predicted potential in training set before wemade next potentialprediction.

68

Thedatasetweused for this experimenthas four features (Pid,Age, potential,last year potential (time lag feature). For this experiment, we created only 1time-lag feature (last year potential) by auto-regressing the target, which ispotential. After normalizing the data, we split it into test and training set. Intrainingset,weconsideredonlysingleplayerdatafromtheageof15till19,andin the test set,we considered sameplayer data from the age of 19 till 26.Wehaveconsidered1time-lagfeatureasapredictororindependentvariable,whichislastyearpotentialandthetargetordependentvariableispotential.Themodeltraining is explained in section4.6.3. In this experimentwe first predicted theplayerpotentialattheageof19then,forthepotentialpredictionatage20,weusedthepreviouslypredictedpotentialvaluefor19andaddeditinourtrainingsetandthenmadethepotentialpredictionfortheage20andwecontinuedittilltheageof26.We tested25differentplayers someof the resultsare shown intable 20, rest of the results and player potential graphs are in Appendix A4.Playerswetestedmostlyhaveoneyearoftrainingdataastheystartedplayingattheageof18(accordingtoourdataset),someofthemhavetwoyearsoftrainingdata,andonlyaplayerwithid154236hasadatastartingfrom16andlasttill22.Weusedallthesedifferentscenariostoverifythemodelperformance.Player Optimizers LSTM

UnitsorBlocks

Layers Testplayerages(potentialpredictionerror)

1964

SGD(0.01)

1,10,64

19 20 21 22 23 24 25 261

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

194 SGD(0.01) 1,10,64

1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2885 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.962 0.05206 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.06408 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.033340 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0Table17:playerspecificmodeusingLSTM.Withrecursivelyaddingpredictedvaluesbackintrainingset.Followinggraphsshowingspecificplayerpotentialw.r.tage.Thegraphhastwocurves,blueisforactualpotentialataspecificage(groundtruth-value),andtheorange line is the predicted value of potential using LSTMsmodel. The graphshowsone single curve,whichmeans thatbothvaluesofpotential (actual andpredicted)areoverlappingandhave0predictionerror.

69

Fig.5.11:playerspecificmodel.ShowstheoverlappingofpredictedandactualValuesofpotentialw.r.tageusingLSTMs.player-id194


70



71

72

5.6 Discussion Inthisthesis,weintensivelyassessedtheuseofdeepneuralnetworksfortimeseriesforecastingproblemorthesequencepredictionproblem,whichisyoungsoccerplayer’speakpotentialpredictionwithoptimalage. Thisstudyexhibitsthat recurrent neural network LSTMs performed exceptionally well, with thesequencepredictionproblemsevenwiththesmalltrainingset.Indata-preprocessing,wediscardedsomeoftheplayersdata.Wefoundthatbyusing mean value dataset, our model performance improved as compared tomodel performance with the raw dataset. We also noticed that z-scorenormalization has the great impact on the model performance, especially ondeeplearningmethods.Fromtheresults,wefoundthatmultiplayermodelresultsusinglassoregression,FFNandLSTMarenotdependableastheyshowthatwecanpredictallplayersfuture potentialwithMPPE by using samewindow size andwith samemodelparameters. It provedwrongwhenwehaveused a player specificmodelwithlassoregressionFFNandLSTMs.Wefoundthattopredictthefuturepotentialofevery specific player; we need a different learning rate and different modelparameters.Furthermore,wefoundthattopredicteachplayerfuturepotentialfromtheageof19till26withMPPEweneedadifferentwindowsizefortrainingthemodel.Fromtheresults,we found thatwecannotuseLassoregressionevenwith theplayer specific model for precisely predicting young players future potentialbecauseitdoesnotgettrainonsmallwindowsizespecificallybyusingonlytheplayer’sdatabetweentheageof15till19andthusneeds longerwindowsize.Moreover, we have seen that we remained unable to predict future potentialpreciselyforplayerid5206,evenafterusingawindowsizeof9yearsformodeltraining. Likewise, Using FFN with the player-specific model, we remainedunable tomadeprecise futurepotentialpredictionsofyoungsoccerplayersbyusing only the player’s data between the age of 15 till 19 for model training.However,FFNperformedmuchbetterthanthebaselinemodel1.Ascomparedtobaselinemodel1,forsomeplayers, itpredictedthefuturepotentialwithMPPEby using small trainingwindow and also players potential prediction error atspecificagearemuchlesserthanbaselinemodel1.Wehavenoticedthatwecanpredictplayer’sfuturepotentialwithlesspotentialpredictionerroratspecificagebyusingonlyauto-regressedtimelagfeaturesofpotentialaspredictorswithFFN.Inouropinion,auto-regressedtimelagfeaturesare more capable of explaining the potential as compared to other extractedfeatures.

73

Fromtheexperiments,wehaveseenthatLSTMsperformedaccuratelyandhadgivenzeropotentialpredictionerror.LSTMsperformed insuchawaybecausethey catered the sequential nature in data and have considered somedependency between the recent inputs with time. They have the property ofdynamicallychangingwindowsizemeanshiddenlayerdoesnotonlydependsonthe current input but also depends on the previous hidden state. By thistechnique, they learned the entire sequence. We have used M-1 sequenceprediction model with LSTMs. In many to one model, we train the model bygivingmultipleinputswithconsecutivetime-stepsandthenmadeaprediction.Ifwegatheredallthemodelpredictions,theybecomeasequence,andwecancallitsequenceprediction.

74

75

Chapter 6 ConclusionandFutureWorkTheaimofthisthesisistousedeeplearningmethodforthesequenceprediction.Inourcase,it’saboutthepredictionofyoungsoccerplayerspeakpotentialwithoptimalageandalsotofindoutwhichvariablesorpredictorsareimportanttopredictthepotentialprecisely.Afterexperimentsandresultswecomeupwithaconclusionthatfeatureextractionandfeatureengineeringhasanimportantrolein training traditionalmachine learningalgorithms,butwhen it comes todeeplearningmethods,theydon’trelymuchonhandcraftedfeatures,theyarecapableoffindinghiddenpatternsinthedatawithtime.Featureextractionandfeatureengineering is a time-consuming process which also depends on researchersability and requiresdomainknowledge.After all this, it is not guaranteed thatextracted features are fully capable of explaining the target.We also concludethat baseline models performance improved significantly when we have usedthemwithregular intervalormeanvaluedatasetwhichwehavecreatedusingthemeanvaluesoffeaturesataspecificage.

From the results,weconcluded that it isnotpossible topredictyoungplayersfuturepotentialprecisely,usingonlytheir initialyearsofdataspecificallyfromtheageof15till19withthemultiplayermodel.

We can also conclude that it is not possible to predict young players futurepotentialprecisely,usingonlytheirinitialyearsofdataspecificallyfromtheageof 15 till 19 with lasso regression and FFN even with player-specific modelbecausethesemodelconsidersevery input independentlyanddonotcater thesequentialnatureofdataandalsorequiresmoretrainingdata.

From the results, the performance of recurrent neural networks LSTMs isobvious.Itprevailedoverallthelimitationswhichbaselinesmodelsgettangledwith. LSTMsusedonly one auto-regressed time lag feature and got trainedonverysmalldata.WethinkthattheoutstandingperformanceofLSTMsisduetotheir intrinsicabilityofmemory,byusingthattheyconsidersomedependencybetween recent inputs with time.We concluded that we could use LSTMs forpreciselypredictingtheyoungplayerspeakpotentialwithoptimalage.One of themajor limitations of ourproposedmodel is thatwehave to train aseparatemodel for each player, which is time-consuming as well. As a futurework, we suggest splitting the data by field positions and then applying thetraditionalmachinelearningmethods,wethinkthatbysplittingdatasetinsucha

76

way can improve the performance of multiplayer model as well. Anotherimportant future work would be the use of clustering by players’ age; eachclustercontainsplayersofsameageorclustercanbemadeonthefieldpositionsas well. As we converted the dataset into regular interval by using mean offeatures at specific age for each player, related to this, one of the futureworkcould be done with data generation, up-sampling or downsampling method,wherewegenerate thedata for themissingperiods andbydoing thatwe cancome up with a dataset that have regular intervals. One of the future workswould be the use of decision tree andArimamethodswith the player-specificmodel.Inouropinion, forecastingproblems like thisoneshouldnotrelyon thesetoffeaturesotherthantimelagfeaturesbecauseinforecastingproblemswepredictfuture,andwedonothavethevaluesforthosevariablesorpredictorsinfuture.Forsuchproblems(whereweconsidersetof features)wefirsthavetopredictthe features and thenuse them for furtherpredictionor forecasting.We thinksuchanapproachcanleadtomoreunreliablepredictionascomparedtotheonewhereweusedtimelagfeatures.

77

AppendixA1Hyper-parameters

TableA1:HyperparameterConfigurationusingLsssoRegression.Model Optimiz

ersLayers

Activation

Testplayerages(MSE)


SGD0.0001

19 20 21 22 23 24 252 relu 616.5

9287.78

149.16

62.61 20.96

10.16

12.84

4 relu 529.81

268.80

131.33

56.16 16.35

9.11 12.40

6 relu 603.72

240.09

111.85

50.52

15.55

9.49 13.02

Sgd0.001

2

Tanh 651.47

262.91

114.01

66.02 30.16

17.37

17.34

relu 513.186

202.205

128.264

44.461

17.55

10.65

13.87

Sigmoid

640.80

326.01

172.12

77.29 33.28

18.85

19.07

4 Tanh 526.83

232.48

65.16 29.78 23.02

15.62

43.01

Sigmoid

746.68

509.11

411.20

366.26

339.31

278.19

332.72

relu 221.20

129.85

131.64

138.72

18.00

21.86

9.75

6 relu 504.90

266.154

107.58

49.57

15.05

8.73 8.42

TableA2.Hyperparameterconfiguration(usingdifferentlearningrates,hiddenlayers,activationfunctions)usingFeedFrowardNeuralNetworkusingwalkforwardValidation.Trainingplayers(15till19)andTestsetsameplayers.

Dataset

Trainingset Testset Learningrate

TestSet(MSE)26yr MSE

Allfeatures Same playersage=15-25

Same players

N=36460pn=36460p

3.0-L 37.0692.0L 39.6373.0-R 69.8150.05-R 69.815

Mean valuesbest features(8)


Sameplayers

N=40565n=40565

0.5-L 8.502

0.4-L 8.667

3.0-L 15.941

78

Referredfromtable13usingFFNModel Optimiz

ersLayers

Activation

Testplayerages


adam0.0001

19 20 21 22 23 24 252 relu 4149.

513426.23

2594.30

1828.66

958.77

453.66

293.11

4 relu 3895.39

1714.19

423.91

263.40

148.86

121.15

91.70

6 Relu 3823.51

500.99

307.27

197.25

112.31

81.37

63.15

adam0.001

2 relu 707.09

359.67

170.23

60.74 25.02 22.90

26.57

4 relu 590.85

240.91

105.77

53.40 21.88 19.61

25.56

6 relu 522.32

181.59

94.46 50.05 28.48

21.05

24.70

TableA3.Hyperparameterconfiguration(usingdifferenthiddenlayersandlearningrates)usingFeedFrowardNeuralNetworkusingwalkforwardValidation.Trainingplayers(15till19)andTestsetsameplayersPlayer Optimizers LSTM

UnitsorBlocks

Layers Testplayerages(MSE)

1964

SGD(0.01)

1,10,64

19 20 21 22 23 24 25 261

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

194 SGD(0.01) 1,10,64

1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2885 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.962 0.05206 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.06408 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.033340 SGD(0.01) 64 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04719 SGD(0.01) 10 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.09883 SGD(0.01) 1 1 0 0 0 0 0 0 0 017459 SGD(0.01) 10 1 0 0 0 0 0 0 0 020020 SGD(0.01) 1 1 0 0 0 0 0 0 0 0154236 SGD(0.01) 10 1 0 0 0 0 N/a n/a n/a N/a31061 SGD(0.01) 10 1 0 0 0 0 0 0 0 015482 SGD(0.01) 10 1 0 0 0 0 0 0 0 018029 SGD(0.01) 10 1 0 0 0 0 0 0 0 018418 SGD(0.01) 10 1 0 0 0 0 0 0 0 018985 SGD(0.01) 10 1 0 0 0 0 0 0 0 036596 SGD(0.01) 1 1 0 0 0 0 0 0 0 033337 SGD(0.01) 1 1 0 0 0 0 0 0 0 033193 SGD(0.01) 1 1 0 0 0 0 0 0 0 022855 SGD(0.01) 1 1 0 0 0 0 0 0 0 0Table A4 Hyperparameter Configuration using LSTMs with recursively adding predictedvaluesbacktotrainingsetFollowing graphs showing specific player potential w.r.t age. The graph has 2curves, blue is for actualpotential at specific age (ground truth-value) and theorangelineisforthepredictedvalueofpotentialusingLSTMsmodel.Thegraph

79

shows 1 single curve, which means that both values of potential (actual andpredicted)areoverlappingandhave0predictionerror.

Fig.18:playerspecificmodel.ShowstheoverlappingofpredictedandactualValuesofpotentialw.r.tageusingLSTMs.player-id15482


80



Fig. 22: player specific model. Shows the overlapping of predicted and actual Values ofpotentialw.r.tageusingLSTMs.player-id36596

81


82

References [1]Li,Yi&Ma,Lingxiao.(2015).PredictNBAplayers’careerperformancebasedonSVM.157-161.[2]RamaIyer,Subramanian&Sharda,Ramesh.(2009).Predictionofathletesperformanceusingneuralnetworks:Anapplicationincricketteamselection.ExpertSyst.Appl..36.5510-5522.[3]DendirandSeife.(2016),Whendosoccerplayerspeak?Anote.JournalofSportsAnalytics.2(2),89-105.[4]Arndt,C.andBrefeld,U.(2016),Predictingthefutureperformanceofsoccerplayers.StatisticalAnalysisandDataMining:TheASADataScienceJournal,9:373–382.[5]ToeringTT,Elferink-GemserMT,JordetG,VisscherC.(2009).Selfregulationandperformancelevelofeliteandnoneliteyouthsoccerplayers.JSportsSci,27(14),1509-17[6]FigueiredoAJ,CoelhoeSilvaMJ,MalinaRM.(2011).Predictorsoffunctionalcapacityandskillinyouthsoccerplayers.JMedSciSports,21(3),446-54.[7]Yang,K.,Yoon,H.andShahabi,C.(2005).Asupervisedfeaturesubsetselectiontechniqueformultivariatetimeseries.ProceedingsoftheWorkshoponFeatureSelectionforDataMining:InterfacingMachineLearningwithStatistics,92-101.[8]Hahn,H.,Meyer-Neiberg,S.,Stefan,P.(2009)Electricloadforecastingmethods:Toolsfordecisionmaking.EuropeanJournalofOperationalResearch.199(3),902-907.[9]Amral,N.,Ozveren,CS.,King,D.(2007).Shorttermloadforecastingusingmultiplelinearregression.UniversitiespowerengineeringConference.1192-1198.[10]YoushuaBengio,IanGoodfellowandAaronCourville.(2016).DeepLearning.MITPress.[11]CorralBobadillaM.,FernandezMartinezR.,LostadoLorzaR.,SomovillaGomezF.,VergaraGonzalezE.P.(2017)CyclonePerformancePredictionUsingLinearRegressionTechniques.In:GrañaM.,López-GuedeJ.,EtxanizO.,HerreroÁ.,QuintiánH.,CorchadoE.(eds)InternationalJointConferenceSOCO’16-CISIS’16-ICEUTE’16.ICEUTE2016,SOCO2016,CISIS2016.vol527.[12]Tso,Geoffrey&K.W.Yau,Kelvin.(2007).Predictingelectricityenergyconsumption:Acomparisonofregressionanalysis,decisiontreeandneuralnetworks.Energy.32.1761-1768.[13]MuhammadA.RaziandKuriakoseAthappilly.2005.Acomparativepredictiveanalysisofneuralnetworks(NNs),nonlinearregressionandclassificationandregressiontree(CART)models.ExpertSyst.Appl.29(1),65-74.[14]R.C.Barros,M.P.BasgaluppandAlexA.Freitas.(2012).Asurveyofevolutionaryalgorithmsfordecisiontreeinduction.IEEETransactionsonSystems,Man,andCybernetics.42(3).291-312.

83

[15]GlennDe’athandKatharinaE.Fabricus.(2000).ClassificationandRegressionTrees:Apowerfulyetsimpletechniqueforecologicaldataanalysis.EcologicalSocietyofAmerica.81(11).3278-3192.[16]S.RasoulSafavianandDavidLandgrebe.(1991).ASurveyofDecisionTreeClassifierMethodology.IEEETransactionsonSystems,Man,andCybernetics.21(3).660-674.[17]BrendanKitts.RegressionTrees.[18]RobJHyndman,GeorgeAthanasopoulos.Forecasting:principlesandpractice.[19]FeinbergE.A.,GenethliouD.(2005)LoadForecasting.In:ChowJ.H.,WuF.F.,MomohJ.(eds)AppliedMathematicsforRestructuredElectricPowerSystems.PowerElectronicsandPowerSystems.269-285.[20]Ogutu,J.O.,Piepho,H.,&Schulz-Streeck,T.(2011).Acomparisonofrandomforests,boostingandsupportvectormachinesforgenomicselection.BMCproceedings.[21]Malina,R.M.,Ribeiro,B.,Aroso,J.,&Cumming,S.P.(2007).Characteristicsofyouthsoccerplayersaged13–15yearsclassifiedbyskilllevel.BritishJournalofSportsMedicine,41(5),290–295.[22]KannekensR,Elferink-GemserMT,VisscherC.(2011).Positioninganddeciding:keyfactorsfortalentdevelopmentinsoccer.JMedSciSports,21(6),846-52.[23]FigueiredoAJ,CoelhoeSilvaMJ,MalinaRM.(2011).Predictorsoffunctionalcapacityandskillinyouthsoccerplayers.JMedSciSports,21(3),446-54.[24]VestbergT,GustafsonR,MaurexL,IngvarM,PetrovicP.(2012).Executivefunctionspredictthesuccessoftopsoccerplayers.[25]Zuber,C.,Zibung,M.,&Conzelmann,A.(2016).HolisticPatternsasanInstrumentforPredictingthePerformanceofPromisingYoungSoccerPlayers–A3-YearsLongitudinalStudy.FrontiersinPsychology,7,1088.[26]websitelink(http://www.peaksports.com/sports-psychology-blog/8-tips-to-unlock-potential-in-sports/)[27]Abdullah,MohamadRazali;Musa,RabiuMuazu;Maliki,AhmadBisyriHusinMusawiB.;Kosni,NorlailaAzura;Suppiah,PathmanathanK.(2016).Roleofpsychologicalfactorsontheperformanceofelitesoccerplayers.JournalofPhysicalEducationandSport.16(1).[28]Lipton,C.Z,Kale,C.D ,ElkanC,Wetzel,R.(2015).LearningtodiagnosewithLSTMrecurrentneuralnetwork.eprintarXiv:1511.03677.[29]Xiong,R,Nicols,P.E,Shen,Y.(2015).DeeplearningStockVolatilitywithGoogleDomestictrends.eprintarXiv:1512.04916

84

[30] Shi, H., Xu, Minghao and Li, R. (2017). Deep learning for household loadForecasting.AnovelpoolingdeepRNN.IEEETransactionsonSmartGrid.PP(99).1-1.[31]Gensler,A.,Henze,J.,Sick,B.Raabe,N.(2016).Deeplearningforsolarpowerforecasting.Anapproach-usingautoencoderandLSTMNeuralNetworks.Systems,Man,andCybernetics(SMC).[32]SimonHartley.(2011).Peakperformanceeverytime.[33]T.Chai,R.R.Draxler.(2014).Rootmeansquareerrorormeanabsoluteerror?ArgumentsagainstavoidingRMSEintheliterature.GeoScientificModelDevelopment.1247-1250.[34]Sak,H.,Andrew,S.,Beaufays,F.(2014).LongShort-TermMemoryBasedRecurrentNeuralNetworkArchitecturesForLargeScaleAcousticModeling.FifteenthAnnualConferenceoftheInternationalSpeechCommunicationAssociation.[35]Jozefowicz,R.,Zaremba,W.,andSutskever,I.(2015).Anempiricalexplorationofrecurrentnetworkarchitectures.InProceedingsofthe32ndInternationalConferenceonInternationalConferenceonMachineLearning-Volume37(ICML'15),FrancisBachandDavidBlei(Eds.),Vol.37.JMLR.org2342-2350. [36]Martin,S.,M.,Schlüter,R.,andNey,H.(2012).LSTMNeuralNetworksForLanguageModeling.Interspeech. [37]K.Greff,R.K.Srivastava,J.Koutník,B.R.SteunebrinkandJ.Schmidhuber.(2017).LSTM:ASearchSpaceOdyssey.IEEETransactionsonNeuralNetworksandLearningSystems,28(10),pp.2222-2232.[38]Larry,D.M.,AstudyofthemeasurementofpotentialFootballabilityinhighschoolplayers.[39] Kannekens R,Elferink-Gemser MT,Visscher C.,Post .J .W(2009). Self-AssessedTacticalSkillsinEliteYouthSoccerPlayers:ALongitudinalStudy.109(2),459-472.[40]KevinWheeler.PredictingNBAplayerperformance.[41]Freifeld,L.(2012).AssessingHumanPotentialToPredictFutureSuccess.TrainingMag.https://trainingmag.com/content/assessing-human-potential-predict-future-success[42]Bergman,R.L.,Wangby,M.(2014).Theperson-orientedapproach:Ashorttheoreticalandpracticalguide.2(1),pp.29-49.[43]Olah,C.(2015).UnderstandingLSTMNetworks.http://colah.github.io/posts/2015-08-Understanding-LSTMs/

85

[44]Stein,M.R.(2007).Benchmarkingdefaultpredictionmodels:pitfallsandremediesinmodelvalidation.JournalofRiskModelValidation.1(1).Pp.77-113.[45]Cheng,H.,Garrick,D.J.,&Fernando,R.L.(2017).Efficientstrategiesforleave-one-outcrossvalidationforgenomicbestlinearunbiasedprediction.JournalofAnimalScienceandBiotechnology.8(38) [46]Bramley,J.W.(2006).Therelationshipbetweenstrength,powerandspeedmeasuresandplayingabilityinpremierlevelcompetitionrugbyforwards.MasterThesis.QueenslandUniversityofTechnology,Australia.[47]Chen,G.(2018).AgentletutorialofRecurrentNeuralNetworkwithErrorBackpropagation.[48]Pedregosaetal.(2011).Scikit-learn:MachineLearninginPython.JMLR12,pp.2825-2830[49]Abadi,M.,Barham,P.,Chen,J.,Chen,Z.,Davis,A.,Dean,J.,Devin,M.,Ghemawat,S.,Irving,.G.,Isard,M.,Kudlur,M.,Levenberg,J.,Monga,R.,Moore,S.,Murray,G.D.,Steiner,B.,Tucker,P.,Vasudevan,V.,Warden,P.,Wicke,M.,Yu,Y.,Zheng,X.(2016).TensorFlow:Asystemforlargescalemachinelearning.

Documents

Master Thesis Predicting Young Soccer Players Peak Potential …essay.utwente.nl/74786/1/Tahir_MS_EEMCS.pdf · Master Thesis Predicting Young Soccer Players Peak Potential with Optimal