10
Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens and ´ Africa Peri´ nez Game Data Science Department Silicon Studio 1-21-3 Ebisu Shibuya-ku, Tokyo, Japan {anna.guitart, peipei.chen, paul.bertens, africa.perianez}@siliconstudio.co.jp Abstract—Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in- game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors. Index Terms—social games; time series; forecasting; sequential analysis; deep learning; ARIMA models; gradient boosting I. I NTRODUCTION In the last few years, we have witnessed a genuine paradigm change in the development of video games [1], [2]. Nowadays, petabytes of player data are available, as every action, in- app purchase, guild conversation or in-game social interaction performed by the players is recorded. This provides data sci- entists and researchers with plenty of possibilities to construct sophisticated and reliable models to understand and predict player behavior and game dynamics. Game data are time- dependent observations, as players are constantly interacting with the game. Therefore, it is paramount to understand and model player actions taking into account this temporal dimension. In-game events are drivers of player engagement. They in- fluence player behavior due to their limited duration, strongly contributing to the in-game monetization (for instance, an event that offers a unique reward could serve to trigger in-app purchases). Those events break the monotony of the game, and thus it is essential to have a variety of types, such as battles, rewards, polls, etc. Anticipating the most adequate sequence of events and the right time to present them is a determinant factor to improve monetization and player engagement. Forecasting time series data is a challenge common to multiple domains, e.g. weather or stock market prediction. Time series research has received significant contributions to improve the accuracy and time horizon of the forecasts, and an assortment of statistical learning and machine learning techniques have been developed or adapted to perform robust time series forecasting [3], [4]. Time series analysis focuses on studying the structure of the relationships between time-dependent features, aiming to find mathematical expressions to represent these relationships. Based on them, several outcomes can be forecast [5]. Stochastic simulation [6] optimization consists on finding a local optimum for a response function whose values are not known analytically but can be inferred. On the other hand, the analysis of what-if scenarios (or just simulation optimization) is the process of finding the best inputs among all possibilities that maximize an outcome [7]. The aim of this work is twofold: on the one hand, to accurately forecast time series of in-game sales and playtime; on the other, to simulate events in order to find the best combination of in-game events and the optimal time to publish them. To achieve these goals, we performed an experimental analysis utilizing several techniques such as ARIMA (dynamic regression), gradient boosting, generalized additive models and Fig. 1. Screenshot of an in-game gacha event in Grand Sphere developed by Silicon Studio. The left panel shows an item that players can purchase, which on opening reveals an in-game card (shown in the right panel), in this case having 3 stars. The card obtained is random, and cards with more stars are more valuable and also rarer. Different in-game events can modify the probability of getting cards with more stars or the types of cards that can be obtained. arXiv:1710.01931v1 [stat.ML] 5 Oct 2017

Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

Forecasting Player Behavioral Dataand Simulating in-Game Events

Anna Guitart, Pei Pei Chen, Paul Bertens and Africa PerianezGame Data Science Department

Silicon Studio1-21-3 Ebisu Shibuya-ku, Tokyo, Japan

{anna.guitart, peipei.chen, paul.bertens, africa.perianez}@siliconstudio.co.jp

Abstract—Understanding player behavior is fundamental ingame data science. Video games evolve as players interact withthe game, so being able to foresee player experience wouldhelp to ensure a successful game development. In particular,game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucialto increase player engagement and maximize monetization. Wepresent an experimental analysis of several methods to forecastgame-related variables, with two main aims: to obtain accuratepredictions of in-app purchases and playtime in an operationalproduction environment, and to perform simulations of in-gameevents in order to maximize sales and playtime. Our ultimatepurpose is to take a step towards the data-driven developmentof games. The results suggest that, even though the performanceof traditional approaches such as ARIMA is still better, theoutcomes of state-of-the-art techniques like deep learning arepromising. Deep learning comes up as a well-suited general modelthat could be used to forecast a variety of time series withdifferent dynamic behaviors.

Index Terms—social games; time series; forecasting; sequentialanalysis; deep learning; ARIMA models; gradient boosting

I. INTRODUCTION

In the last few years, we have witnessed a genuine paradigmchange in the development of video games [1], [2]. Nowadays,petabytes of player data are available, as every action, in-app purchase, guild conversation or in-game social interactionperformed by the players is recorded. This provides data sci-entists and researchers with plenty of possibilities to constructsophisticated and reliable models to understand and predictplayer behavior and game dynamics. Game data are time-dependent observations, as players are constantly interactingwith the game. Therefore, it is paramount to understandand model player actions taking into account this temporaldimension.

In-game events are drivers of player engagement. They in-fluence player behavior due to their limited duration, stronglycontributing to the in-game monetization (for instance, anevent that offers a unique reward could serve to trigger in-apppurchases). Those events break the monotony of the game, andthus it is essential to have a variety of types, such as battles,rewards, polls, etc. Anticipating the most adequate sequenceof events and the right time to present them is a determinantfactor to improve monetization and player engagement.

Forecasting time series data is a challenge common tomultiple domains, e.g. weather or stock market prediction.

Time series research has received significant contributionsto improve the accuracy and time horizon of the forecasts,and an assortment of statistical learning and machine learningtechniques have been developed or adapted to perform robusttime series forecasting [3], [4].

Time series analysis focuses on studying the structure ofthe relationships between time-dependent features, aiming tofind mathematical expressions to represent these relationships.Based on them, several outcomes can be forecast [5].

Stochastic simulation [6] optimization consists on finding alocal optimum for a response function whose values are notknown analytically but can be inferred. On the other hand, theanalysis of what-if scenarios (or just simulation optimization)is the process of finding the best inputs among all possibilitiesthat maximize an outcome [7].

The aim of this work is twofold: on the one hand, toaccurately forecast time series of in-game sales and playtime;on the other, to simulate events in order to find the bestcombination of in-game events and the optimal time to publishthem. To achieve these goals, we performed an experimentalanalysis utilizing several techniques such as ARIMA (dynamicregression), gradient boosting, generalized additive models and

Fig. 1. Screenshot of an in-game gacha event in Grand Sphere developedby Silicon Studio. The left panel shows an item that players can purchase,which on opening reveals an in-game card (shown in the right panel), in thiscase having 3 stars. The card obtained is random, and cards with more starsare more valuable and also rarer. Different in-game events can modify theprobability of getting cards with more stars or the types of cards that can beobtained.

arX

iv:1

710.

0193

1v1

[st

at.M

L]

5 O

ct 2

017

Page 2: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

deep neural networks [8]–[11].Pioneering studies on game data science in the field of video

games, such as [12]–[15], concentrated in churn prediction.Other related articles that analyze temporal data in the gamedomain [16]–[20] focused on unsupervised clustering, not insupervised time series forecast. To the best of our knowledgethis is the first work in which forecasts of time series of gamedata and simulations of in-game events are performed.

II. MODEL DESCRIPTION

A. Autoregressive Integrated Moving Average (ARIMA)

ARIMA was firstly introduced by Box and Jenkins [21] in1976, in a book that had a tremendous impact on the fore-casting community. Since then, this method has been appliedin a large variety of fields, and it remains a robust model,used in multiple operating forecast systems [22]. ARIMAcharacterizes time series focusing on three aspects: 1) Theautoregressive (AR) terms model the past information of thetime series, 2) The integrated (I) terms model the differencingneeded for the time series to become stationary, e.g. the trendof the time series, 3) The moving average (MA) terms controlthe past noise of the original time series.

Specifically, the AR part represents the time series as afunction of p past observations,

yt = ϕ1yt−1 + · · ·+ ϕpyt−p, (1)

with ϕ1, . . . , ϕp the AR coefficients and p the number of pastobservations needed to perform a forecast at the current time.The MA component, rather than focusing on past observations,uses a moving average of past error values in a regressionmodel,

yt = εt + θ1εt−1 + · · ·+ θqεt−q, (2)

where q is the number of moving average lags, θ1, . . . , θq arethe MA coefficients and εt, . . . , εt−q the errors. Finally, a dparameter is used to model the order of differencing, i.e. theI term of ARIMA:

y∗t = yt − yt−1 − · · · − yt−d. (3)

Taking d = 1 is normally enough in most cases [5]. If d = 0,the model is reduced to an ARMA(p, q) model.

The ARIMA model can only analyze stationary time se-ries; however most of the stochastic processes exhibit non-stationary behavior. Only through the differencing operation,or by applying a previous additional transformation to the timeseries, e.g. a log or Box–Cox [23] transformation, we canconvert the time series into stationary objects.

We can write the ARIMA model as

ϕ(L)(1− L)dyt = θ(L)εt, (4)

where L is the lag operator, i.e. Lϕt = ϕt−1. Therefore, wecan express the ARIMA model in terms of the p, d and qparameters as(

1−p∑i=1

ϕiLi

)(1− L

)dyt =

(1 +

q∑j=1

θjLj

)εt. (5)

Box and Jenkins also generalized the model to recognizethe seasonality pattern of the time series by using yt =ARIMA(p, d, q)(P,D,Q)m [21]. The parameters P , D, Qrepresent the number of seasonal autoregressive, differencingand moving average terms. The order of the seasonality (e.g.weekly, monthly, etc.) is set by m. In order to determinethe ARIMA parameters, the autocorrelation function (ACF)and partial autocorrelation function (PACF) are analyzed [21].Once the parameters are fixed, the model is fitted by maximumlikelihood and selected (among the different estimated models)based on the Akaike (AIC) [24] and Schwarz Bayesian (BIC)[25] information criteria.

Dynamic Regression: To include the serial dependencyof external covariates, a multivariate extension of Equation5 is proposed, the so-called dynamic regression (DR). Therelationship between the output, its lags and the variables islinear. DR takes the form [26]

(1−

p∑i=1

ϕiLi

)(1− L

)dyt

=

k∑r=1

γr

(1−

p∑l=1

ϕlLl

)(1− L

)dxrt

+

(1 +

q∑j=1

θjLj

)εt. (6)

The first term represents the AR and I parts, the lastcorresponds to the MA component and the middle term iswhere the k external variables xt are included. The γr are thecorresponding coefficients of the covariates.

B. Gradient Boosting Models

Gradient boosting machines (GBMs) [27] are ensemble-based machine learning methods capable of solving classifi-cation and regression problems [9]. A GBM consists of anensemble of weak learners (i.e. learners that perform onlyslightly better than random classifiers), commonly decisiontrees. Each weak learner is sequentially added to the ensemble,thus continuously improving its accuracy [28]. The approachtaken by GBMs stems from the idea that boosting (usingmultiple weak models to create a strong model) can be seenas an optimization algorithm on some suitable loss function[29]. If this loss function is differentiable, the gradient can becalculated and gradient descent can be used for optimization[9]. This makes it possible to recursively add weak learnersthat follow the gradient, minimizing the loss function and re-ducing the error at each step. The construction of an ensembleto fit a desired function f(x) is illustrated below (a more in-depth analysis can be found in [30]). For each weak learnerin the ensemble, we sequentially do the following: First, wecompute the negative gradient

zi = −δ

δf(xi)L(yi, f(x))|f(xi)=f(xi)

, (7)

where L is the loss function and i the current weak model.

Page 3: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

Fig. 2. Time series of the total daily sales (above) and total daily playtime (below) from Age of Ishtaria over a period of ∼2.5 years. In this and other figuresbelow, the quantitative sales and playtime values are not shown in the vertical axes for privacy reasons.

Then, we fit a regression model g(x) predicting zi from xiand apply gradient descent, with step size given by

ρ(x) = argminρ

N∑i=1

L(yi, f(xi)) + ρg(xi). (8)

Finally, we update the estimate of f(x) through

f(x)← f(x) + ρg(x). (9)

GBMs can effectively capture complex non-linear depen-dencies [31] and several strategies to avoid overfitting can beapplied, e.g. shrinkage [9] or early stopping [32].

C. Generalized Additive and Generalized Additive MixedModels

The generalized additive models (GAMs) derived by [33]are a combination of generalized linear (GLMs) and additive(AMs) models. In this way, GAMs exhibit the properties ofboth, namely the flexibility to adapt to any distribution ofGLMs and the non-parametric nature of AMs.

The structure of a GAM is

g(E(y)) = β0 + s1(x1) + · · ·+ sn(xn), (10)

where n is the number of predictors, E(y) is the expectedvalue, g(·) is a link function, and si(·) are smooth functions.

Even though the distribution of the response variable is thesame as for GLMs, there is a generalization that allows GAMsto accommodate different kinds of responses, for examplebinary or continuous. GAMs assume that the means of thepredictors xi are related to an additive response y through anonlinear link function g (such as the identity or logarithmfunction). The model distribution can be selected from e.g. aGaussian or Poisson distribution [10].

As additive models, in contrast to parametric regressionanalysis (which assumes a linear relation between responsesand predictors), GAMs serve to explore non-parametric rela-tionships, as they make no assumptions about those relations.Instead of using the sum of the individual effects of thepredictors as observations, GAMs employ the sum of theirsmooth functions, which are called splines [34] and include aparameter that controls the smoothness of the curve to preventoverfitting. With the link and smooth functions mentionedabove, the GAM approach has the flexibility to interpret andregularize both linear and nonlinear effects [35].

However, GAMs do not assume correlations between ob-servations (such as the time series temporal correlation in thisstudy). As a consequence, when these are present, anotherapproach would be better suited to perform the forecast. Gen-eralized additive mixed models (GAMMs) [36], an additiveextension of generalized linear mixed models (GLMMs) [37],constitute such an approach. GAMMs incorporate randomeffects u and covariate effects γj (which model the correlationsbetween observations by taking the order of observations intoaccount) into the previous GAM formulation. When estimatingthe jth observation, the structure of a GAMM is

g(E(yj)) = β0 + si(xji) + γ>j u. (11)

Smoothness selection can be automatized [38] and the esti-mation of the GAMM is conducted via a penalized likelihoodapproach.

D. Deep Belief Networks

Deep neural networks (DNNs) [39] have been used withgreat success in recent years, achieving cutting-edge results

Page 4: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

in a wide range of fields [40]. This method has outperformedalternative models on almost all tasks from image classification[41] to speech recognition [42] and natural language modeling[40].

Deep belief networks (DBNs) [43] are an extension ofDNNs where the units in the hidden layers are binary andstochastic. They are capable of learning a joint probabilitydistribution between the input and the hidden layers andcan be used as a generative model to draw samples fromthe learned distribution. A DBN uses a stack of restrictedBoltzman machines (RBMs) [44], where each RBM is trainedon the result of the previous layer as follows:

P (h(k)i = 1|h(k+1)) = σ

(b(k)i +

∑j

W(k)ij h

(k+1)j

), (12)

with W (k) being the weight matrix between the layers k andk + 1 and σ being the logistic sigmoid.

For supervised learning, a DBN can be used as a way topre-train a DNN [45]. Each layer of the RBM is first trainedseparately, and then the weights in all layers are fine-tunedthrough standard back-propagation. A more comprehensiveoverview of DBNs and DNNs can be found in [43] and [39].

There have been previous works on applying DNNs totime series forecasting problems (see e.g. [46] and [11]) andcomparisons between ARIMA and artificial neural networks(ANNs) have also been performed [47]. However, when deal-ing with small datasets, standard DNNs can quickly overfitdue to having too many parameters. Careful regularizationis required to ensure that the model can be generalized tounseen data. Recent techniques like drop-out (which randomlymasks some of the hidden units) [48] or stochastic DBNscan significantly help with this problem, as they constrain thenumber of parameters that can be modified. Additionally, usingL2 regularization on the weights [49] and ensuring properselection of the number of hidden units can make DBNs aneffective model even when faced with smaller datasets.

III. FORECAST PERFORMANCE METRICS

In order to validate the accuracy of forecasting models,several performance measures have been proposed in theliterature [50]. The use of various metrics to analyze timeseries is a common practice. The ones selected in this studypossess different properties, which is important to correctlyassess the forecasting capabilities of the models from differentperspectives (e.g. in terms of the magnitude or direction of theerror). Each of the metrics summarized here is a function ofthe actual time series and forecast results. In all the equationsbelow, n refers to the number of observations, fi denotes theforecast value and ai represents the actual value.

A. Root Mean Squared Logarithmic Error (RMSLE)

The RMSLE can be defined as

RMSLE =

√√√√ 1

n

n∑i=1

(log(fi + 1)− log(ai + 1)

)2. (13)

Fig. 3. Grand Sphere time series of total daily sales (above) and total dailyplaytime (below) over a period of ∼9 months. Quantitative values are notshown in the vertical axes for privacy reasons.

Because the RMSLE uses a logarithmic scale, it is lesssensitive to outliers than the standard root mean square error(RMSE). Additionally, the RMSE has the same tendency tounderestimate and overestimate values, whereas the RMSLEpenalizes more the underestimated predictions.

B. Mean Absolute Scaled Error (MASE)

The MASE is a scale independent metric, i.e. it can be usedto compare the relative accuracy of several forecasting modelsapplied to different datasets. To calculate this metric, errors arescaled with the in-sample mean absolute error (MAE). Theexporession for the MASE is

MASE =1

n

n∑i=1

(|fi − ai|

1n−1

∑ni=2 |ai − ai−1|

). (14)

C. Mean Absolute Percentage Error (MAPE)

The MAPE is estimated by

MAPE =100

n

n∑i=1

∣∣∣fi − aiai

∣∣∣. (15)

Since it is a percentage error measure of the average absoluteerror, the MAPE is also scale-independent. However, if thetime series have zero values, the MAPE yields undefinedresults because of a division by zero [50]. The MAPE is alsobiased towards underestimated values and does not penalizelarge errors.

IV. DATASET

A. Data Source

The study presented in this article focuses on the analysisof two different daily time series, those of playtime and

Page 5: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

TABLE ITUNED PARAMETERS FOR EACH MODEL

Models

GBM ARIMA DBN

Dataset max depth eta p d q P D Q m h n plr tlr k b

Sales Age of Ishtaria 100 0.20 2 1 1 1 1 1 7 2 50 0.0001 0.01 5 50Sales Grand Sphere 1 0.76 2 1 1 1 0 1 7 2 300 0.001 0.1 2 10

Playtime Age of Ishtaria 1 0.66 2 1 2 1 1 1 7 2 50 0.0001 0.01 2 50Playtime Grand Sphere 1000 0.23 1 1 1 1 1 1 7 2 300 0.0001 0.1 2 50

sales. The data was collected from two Japanese game titlesdeveloped by Silicon Studio: Age of Ishtaria (hereafter, AoI)and Grand Sphere (hereafter, GS). Playtime corresponds tothe total amount of time spent in the game by all users,while sales represents the total amount of in-app purchases.For AoI, daily information was extracted from October 2014until February 2017 (∼2.5 years) and for GS, from July 2015until March 2016 (∼9 months). Additionally, data from all thegame events, marketing and promotion campaigns within thecollection period were also gathered to be used as externalvariables for the model. Figures 2 and 3 show the two dailytime series for AoI and GS, respectively, which present cleardifferences concerning trends and seasonal behavior.

B. External features

Proper identification and subsequent removal of outlierscaused by unexplained phenomena can significantly improvethe modeling of the time series [51]. To that end, anomalydetection using a deep autoencoder [52] was performed. Thistechnique is capable of finding subtler anomalous behaviorsthan traditional methods, and shows that the outliers coincidewith the external events of the game that take place on thesame particular day. These events are derived from in-gameinformation and included in the model as external variables.They can be:• Game Events: Events such as raid battles, boss battles,

etc. Each type of event is input separately.• Gacha: A monetization technique used in many success-

ful Japanese free-to-play games. It describes an eventwhere players can spend money to randomly pull anitem from a large pool (inspired by capsule toy vendingmachines).

• Promotions: Release of new cards and discounts to en-gage current users.

• Marketing Campaigns: Acquisition campaigns launchedto obtain new users.

In the case of gacha and promotions, the correspondingevent scale was also taken into account. This scale is usedto quantify the outlier effect. It represents the influence orimportance of the event in question and is related to theamount of money invested in it. Specifically, the event isassigned a value 1, 2, 3 or 4 to denote low, medium, highor super-high influence.

Other non-game-related features included in the model are:

• National Holidays: National Holidays in Japan. [53]• Temperature: Average daily temperature in Tokyo [54].Moreover, the day of the week and month of the year were

added as extra input for the GAMM and GBM, to make itpossible for the model to learn the seasonality effects. For theARIMA model these data do not need to be included as theyare already inherently considered by the seasonal parameters.

V. METHOD

A. Data preparation

To make the original time series stationary, a Box–Coxtransformation [55] is applied to the sales data, and a logarithmtransformation to the playtime data, for the DR, GBM andDBN models. The GAMM is the only technique that does notrequire a prior transformation of the data.

The categorical values for the external features (gameevents, gacha, promotions, marketing campaigns, national hol-idays, etc.) are included in the model as step functions (dummyvariables). For each day in the training and forecasting data,the covariates are encoded with a vector that is either 0 or1 depending on the absence or presence of the event on thatdate. However, for events with an event scale, the vector valuematches the corresponding scale value instead.

Since promotions and marketing campaigns can have somedelayed effects on the series, input is added to the days afterthe campaign release by means of a decay function. Formarketing campaigns, one-week effects are considered, andtheir values are assumed to decrease linearly with time; onthe other hand, for promotions, the decrease is dependent onthe scale of the campaign.

B. Model specification

1) DR: ARIMA parameters are calculated through the ACFand PACF functions. The parameter tuning of the dynamicregression model is performed using cross-validation, basedon the MAPE metric presented in Section III. Table I showsthe parameters used to fit the models.

2) GBM: The implementation used for the GBM modelis XGBoost [56], an efficient and scalable tree boostingmodel. Table I contains the optimal parameters found for XG-Boost using cross-validation and grid search. The parametermax depth refers to the maximum depth of a tree, and etais the step size used to prevent overfitting. In the case ofGBMs, tuning the model is computationally expensive and

Page 6: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

Fig. 4. Actual and forecast sales time series for AoI, for the period from Jan 10, 2017 until Feb 8, 2017. The different panels correspond to the forecast withDR (top left), GAMM (top right) gradient boosting (bottom left) and DBN (bottom right).

time-consuming as there are many parameters. However, theparameter search can be automatized and directly re-usedfor equivalent time series data from other game titles, whichmakes it more flexible.

3) GAMM: As we have continuous data, we consider theidentity link function with a Gaussian distribution. For theGAMM, weekly and monthly seasonalities are introduced ascyclic P-splines [57]. Gacha is added by applying a P-splinewith 4-knots corresponding to the four values of the eventscale. For the temperature variable, we employ a cyclic cubicregression spline [58] that estimates a periodic smooth regres-sion function for seasonal patterns. For the other variables(holidays, events, day of the week and the month), the defaultspline corresponding to low-rank isotropic smoothers is used[59].

4) DBN: The parameters obtained by grid search are shownin Table I (h: number of hidden layers, n: number of nodesper layer, plr: pre-train learning rate, tlr: fine-tuning learningrate, k: number of steps to perform Gibb sampling in theRBM, b: mini-batch size). Before training the model, trainingdata are shuffled and 80% of them are randomly assigned tothe training set and the other 20% to the validation set. Themodel is first trained with the training set, and then validatedwith the validation set, for every epoch. To avoid overfitting,early stopping [60] is applied and the fine-tuning iteration isterminated when the loss of the validation set stops decreasingfor 20 consecutive epochs.

TABLE IIERROR RESULTS FOR TIME SERIES FORECASTING

Dataset Model RMSLE MASE MAPE

DR 0.106 0.669 8.9Sales GBM 0.118 0.727 10.2

Age of Ishtaria GAMM 0.106 0.672 9.1DBN 0.122 0.770 11.1

DR 0.119 0.741 9.8Sales GBM 0.164 0.814 14.8

Grand Sphere GAMM 0.142 0.874 11.9DBN 0.128 0.762 11.4

DR 0.243 0.921 18.7Playtime GBM 0.237 1.057 19.5

Age of Ishtaria GAMM 0.246 0.931 22.7DBN 0.357 1.057 31.5

DR 0.362 1.034 35.6Playtime GBM 0.559 1.000 73.5

Grand Sphere GAMM 0.495 1.011 56.2DBN 0.265 0.991 24.6

C. Model Validation

To perform the predictions, a period of thirty days is chosenin order to obtain monthly forecasts of sales and playtime.For the test evaluation, we use a rolling forecasting technique[61], taking steps of 7 days for each new forecast and with aminimum of 6 months of training data. For AoI the forecastswere performed weekly from Nov 2, 2015 until Jan 10, 2017.In the case of GS, they were carried out from Oct 5, 2015until Feb 1, 2016. The average errors are then computed over

Page 7: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

the rolling prediction results, which serves to compare theRMSLE, MASE and MAPE values.

VI. RESULTS

Figure 4 shows an example of the predictions within a givenperiod for each of the models. It illustrates that, while theperformance of both DR and the GAMM is relatively similar,the GBM model has much more difficulty capturing the valleysof the series, while the DBN overestimates the peaks. Theprediction errors for sales and playtime displayed in Table IIalso reflect this, showing that both of these models performworse than the GAMM and DR in the case of AoI.

DR does provide better results for playtime forecasting thanthe GAM; still, to achieve a proper parameter selection, weneed to check the autocorrelation and other measures to havemore control over the fitting. This evaluation turns DR into arigid model, which cannot adapt easily to time series data fromother games, while the GAMM is more flexible and easier totune. Once the smooth functions in the GAMM are selected,they automatically adjust to fit the distribution of the externalvariables. This way, the model can also learn to fit time seriesdata from other games of the same nature.

Table II also shows the forecasting error results for GS. Wecan see that the pattern is similar to that for AoI. The perfor-mance of the GAMM and DR is approximately the same, andboth methods outperform GBM on the sales forecasting. Inthis case, the DBN does perform significantly better than theGBM model, and also slightly better than the GAMM. Overall,however, taking into account all error measures for both gamesand dimensions (i.e. playtime and sales), the GAMM yieldsthe most consistent results.

A. Forecasting horizon

The forecasting horizon was evaluated for all the models,since the performance can significantly decrease as the numberof days forecast increases [62]. Figure 5 depicts the resultingRMSLE, MASE and MAPE as a function of time, illustratingthus the predictability of the different models. The GAMMperforms better than all the other models, staying muchflatter for all error measures, which indicates that the forecastaccuracy does not decrease much even when forecastingtwo or three months into the future. The GBMmodel alsoshows a steadier behavior, with a reasonably stable forecastingaccuracy. However, this method has much higher initial andoverall errors than the GAMM. The DBN has more difficultykeeping the predictions stable, showing divergent behavior asthe number of days forecast increases. Finally, the DR resultsdiverge rapidly as the forecast period becomes larger. Fora short prediction range of just a few days, this techniqueperforms better than the GAMM, but when the forecasthorizon increases, the errors also become significantly larger.

B. Minimal training set

We performed an error analysis of the model performanceas a function of the training set size in order to evaluate theminimal training time required to obtain robust predictions.

Figure 5 shows a significant drop in the prediction errorsafter 12 months of training time for the GAMM, DBN andGBM. For the GAMM, the error consistently decreases withtraining time, while the DR performance is more unstable.The latter behavior can also be seen for the DBN and couldbe explained by the variable nature of the time series, whichcauses instability during training. In general, a 12-monthtraining set should be sufficient to obtain the most accurateforecasts, but even after just 6 months of training data theerrors are already relatively low, especially for DR and theGAMM.

VII. EVENT SIMULATION

Simulation optimization (i.e. the analysis of what-if scenar-ios) is used to find the optimum input value that maximizesthe response [6], [7]. Using the time series forecasting modelsproposed in this work, a simulation was carried out to analyzethe effect of future events on the total playtime and sales. Theorder of the upcoming events can be changed to evaluate howa different event planning impacts the forecasts.

A. Simulation Results and Analysis

Figure 7 shows an example of a simulation, with differentevent sequences being input into the models. In the case ofDR, the sales for Sequences 1 and 2 were 37% higher and25% lower, resepctively, than for the predefined original eventsequence (the sequence that had been planned). Thus, usinga different sequence of events the total sales could have beenincreased by 37%. For the GAMM, Sequence 1 results in anamount of sales lower than that for DR (by 32%). Althoughall models are suitable to perform simulations, their forecastspresent different levels of accuracy. As shown in the resultssection, the GAMM model provides the most accurate estimateof the predicted sales and therefore it can also be expected toproduce the most precise simulation results.

In DR models, the response has a linear relationship withthe features, which has the drawback of not capturing the non-linear behavior of real phenomena. However, this also makessimulations easier, as the interpretation of the parameters isstraightforward compared to other strongly non-linear models,like DBNs.

B. Simulation Engine Tool

A common business problem faced by many game studiosis how to plan future acquisition and in-game events so asto maximize sales and playtime [63]. While it is possible tomanually investigate the success of past events, there are toomany potentially correlated variables to be considered. Theproposed forecasting models can do this automatically, andlearn all the potential impacts of external variables on thefuture sales and playtime, providing a better estimate of theeffect of future event sequences.

In order to provide a solution to the event-planning problemfrom a business perspective, a web-based system for timeseries simulation was developed. With this system, users caneasily plan in-game events by means of an event planner user

Page 8: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

Fig. 5. MASE and MAPE of the forecasting horizon (left panels) and the minimal training period (right panels) for DR, GBM, GAMM and DBN. Resultsfrom AoI.

Fig. 6. Structure of the time series simulation system. The arrows show how the web page, server, database, file system and modeling script interact witheach other. From the user interface, one can specify future game or marketing events for desired dates. Then, the server will take the previously trained model,perform daily predictions of the sales and playtime for the input events and return the simulation results to the user.

interface. After inputting the planned events, the daily salesand playtime for the next 30 days will be simulated and shownwithin a few seconds.

The system structure is shown in Figure 6. A database storesthe forecasting models, the model parameters, and the externalvariables mentioned in Section IV-B (such as temperature andholidays). The parameters are tuned once for each game, andthen models are trained once per month with the parameters.When the server receives a simulation request, it connects thedatabase, the file system and the modeling script, so that themodeling script can obtain the data required for the simulation.After the simulation is finished, the server reads the output fileand sends the results to the front-end for display.

VIII. SUMMARY AND CONCLUSION

Overall, we found that the GAMM and ARIMA models arethe most accurate for daily forecasting of sales and playtime.This result held for both of the evaluated games, Age ofIshtaria and Grand Sphere. However the GAMM has the ad-vantage of requiring less manual tuning, which makes it morepractical in a production environment. The gradient boostingmodel is less suitable for forecasting with these particular timeseries, as it showed difficulty capturing the peaks and valleysof the data. Similarly, the DBN overestimated the peaks, whichcould be explained by the fact that the model has manyparameters, while the dataset used was relatively small (lessthan 1000 observations). The GAMM and ARIMA models, on

the other hand, have less parameters, avoiding altogether theoverfitting issue.

Alternatively, when dealing with much larger datasets thatpresent long-range dependencies, a long short-term memory(LSTM) [64] model could be used to properly capture andlearn these dependencies and provide accurate future pre-dictions. However, for small datasets with very short-rangedependencies, as is the case for daily sales and playtime data,the input can still be fit by a standard DNN using a slidingwindow over the past days. This approach, though, still suffersfrom the overfitting problem due to the small number of data.

Nevertheless the DBN or DNN models still show room forimprovement in time series forecasting, even when dealingwith small datasets. ARIMA is a common forecasting modelapplied in a wide range of fields [65] and has the advantageof inherently containing parameters for time series forecasting.Potentially, we could incorporate such parameters into DNNs,and inspiration from the ARIMA model could be drawn toconstruct a similar model suitable for non-linear forecasts.However this approach is beyond the scope of this paper, as theaim here was to have an interpretable, well-established modelcapable of performing accurate predictions and simulationsthat can be applied to time series forecasting in games.

The GAMM allows for a generalizable model that cancorrectly capture the time series dynamics. It can be used notonly to forecast both future playtime and sales, but also to sim-ulate future game and marketing events. Instead of randomly

Page 9: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

Fig. 7. Simulation results for the sales time series of AoI, forecasting 30 days from Jan 10, 2017. The forecast resulting from the predefined sequence ofevents (consisting of the planned events for that period of time) is compared with two other event sequences, named Sequence 1 and Sequence 2. The differentpanels correspond to the simulation with DR (top left), GAMM (top right), GBM (bottom left) and DBN (bottom right).

deciding the event planning, we can employ a model-basedapproach that uses past information to automatically learn theinteractions that are relevant for predicting event success (e.g.the weather, the day of the week and the national holidays).We provided a solution that can be used operationally in abusiness setting to get real-time simulation results. Allowinggame studios to accurately simulate future events can helpthem to optimize their planning of acquisition campaigns andin-game events, ultimately leading to an increase in the amountof user playing time and to an overall rise of the in-gamemonetization.

IX. SOFTWARE

All analysis except DBN was performed with R version3.3.2 for Linux, using the following packages from CRAN:forecast 1.0 [66], mcgv 1.0 [67] and xgboost 2.38 [56]. DBNwas performed with Python 2.7.12, using Theano [68].

ACKNOWLEDGEMENTS

We thank Sovannrith Lay for helping to gather the data andJavier Grande for his careful review of the manuscript.

REFERENCES

[1] M. S. El-Nasr, A. Drachen, and A. Canossa, “Game analytics,” NewYork, Sprint, 2013.

[2] G. N. Yannakakis and J. Togelius, Artificial Intelligence and Games.Springer, 2017, http://gameaibook.org.

[3] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecast-ing,” International journal of forecasting, vol. 22, no. 3, pp. 443–473,2006.

[4] P. J. Brockwell and R. A. Davis, Introduction to time series andforecasting. Springer, 2016.

[5] R. Adhikari and R. Agrawal, “An introductory study on time seriesmodeling and forecasting,” arXiv preprint arXiv:1302.6613, 2013.

[6] S. Asmussen and P. W. Glynn, Stochastic simulation: algorithms andanalysis. Springer Science & Business Media, 2007, vol. 57.

[7] Y. Carson and A. Maria, “Simulation optimization: methods and appli-cations,” in Proceedings of the 29th conference on Winter simulation.IEEE Computer Society, 1997, pp. 118–126.

[8] G. E. Box and G. M. Jenkins, Time series analysis: forecasting andcontrol, revised ed. Holden-Day, 1976.

[9] J. H. Friedman, “Greedy function approximation: a gradient boostingmachine,” Annals of statistics, pp. 1189–1232, 2001.

[10] T. J. Hastie and R. J. Tibshirani, Generalized additive models. CRCpress, 1990, vol. 43.

[11] E. Busseti, I. Osband, and S. Wong, “Deep learning for time seriesmodeling,” Technical report, Stanford University, 2012.

[12] C. Bauckhage, K. Kersting, R. Sifa, C. Thurau, A. Drachen, andA. Canossa, “How players lose interest in playing a game: An empiricalstudy based on distributions of total playing times,” in ComputationalIntelligence and Games (CIG), 2012 IEEE conference on. IEEE, 2012,pp. 139–146.

[13] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauck-hage, “Predicting player churn in the wild,” in Computational intelli-gence and games (CIG), 2014 IEEE conference on. IEEE, 2014, pp.1–8.

[14] A. Perianez, A. Saas, A. Guitart, and C. Magne, “Churn Prediction inMobile Social Games: Towards a Complete Assessment Using SurvivalEnsembles,” in Data Science and Advanced Analytics (DSAA), 2016IEEE International Conference on. IEEE, 2016, pp. 564–573.

[15] P. Bertens, A. Guitart, and A. Perianez, “Games and Big Data: AScalable Multi-Dimensional Churn Prediction Model,” Submitted toIEEE CIG, 2017.

[16] C. Bauckhage, A. Drachen, and R. Sifa, “Clustering game behaviordata,” IEEE Transactions on Computational Intelligence and AI inGames, vol. 7, no. 3, pp. 266–278, 2015.

Page 10: Forecasting Player Behavioral Data and Simulating in-Game … · 2017-10-06 · Forecasting Player Behavioral Data and Simulating in-Game Events Anna Guitart, Pei Pei Chen, Paul Bertens

[17] A. Drachen, R. Sifa, C. Bauckhage, and C. Thurau, “Guns, swords anddata: Clustering of player behavior in computer games in the wild,” inComputational Intelligence and Games (CIG), 2012 IEEE Conferenceon. IEEE, 2012, pp. 163–170.

[18] A. Drachen, C. Thurau, R. Sifa, and C. Bauckhage, “A comparison ofmethods for player clustering via behavioral telemetry,” arXiv preprintarXiv:1407.3950, 2014.

[19] R. Sifa, C. Bauckhage, and A. Drachen, “The playtime principle: Large-scale cross-games interest modeling,” in Computational Intelligence andGames (CIG), 2014 IEEE Conference on. IEEE, 2014, pp. 1–8.

[20] A. Saas, A. Guitart, and A. Perianez, “Discovering playing patterns:Time series clustering of free-to-play game data,” in ComputationalIntelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 2016,pp. 1–8.

[21] G. E. Box and G. M. Jenkins, Time series analysis: forecasting andcontrol, revised ed. Holden-Day, 1976.

[22] K. D. Lawrence and M. D. Geurts, Advances in business and manage-ment forecasting. Emerald Group Publishing, 2006, vol. 4.

[23] G. E. Box and D. R. Cox, “An analysis of transformations,” Journal ofthe Royal Statistical Society. Series B (Methodological), pp. 211–252,1964.

[24] H. Akaike, “A new look at the statistical model identification,” IEEEtransactions on automatic control, vol. 19, no. 6, pp. 716–723, 1974.

[25] G. Schwarz, “Estimating the dimension of a model,” The annals ofstatistics, vol. 6, no. 2, pp. 461–464, 1978.

[26] J. G. Cragg, “Estimation and testing in time-series regression modelswith heteroscedastic disturbances,” Journal of Econometrics, vol. 20,no. 1, pp. 135–157, 1982.

[27] T. G. Dietterich, “Ensemble methods in machine learning,” in Interna-tional workshop on multiple classifier systems. Springer, 2000, pp.1–15.

[28] L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean, “BoostingAlgorithms as Gradient Descent.” in NIPS, 1999, pp. 512–518.

[29] L. Breiman, “Arcing the edge,” Technical Report 486, Statistics Depart-ment, University of California at Berkeley, Tech. Rep., 1997.

[30] G. Ridgeway, “Generalized boosted models: A guide to the gbmpackage,” Update, vol. 1, no. 1, p. 2007, 2007.

[31] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,”Frontiers in neurorobotics, vol. 7, p. 21, 2013.

[32] T. Zhang and B. Yu, “Boosting with early stopping: Convergence andconsistency,” The Annals of Statistics, vol. 33, no. 4, pp. 1538–1579,2005.

[33] T. Hastie and R. Tibshirani, “Generalized additive models: some appli-cations,” Journal of the American Statistical Association, vol. 82, no.398, pp. 371–386, 1987.

[34] J. Maindonald, “Smoothing terms in GAM models,” 2010.[35] K. Larsen, “GAM: The predictive modeling silver bullet,” Multithreaded.

Stitch Fix, vol. 30, 2015.[36] C. Chen, “Generalized additive mixed models,” Communications in

Statistics-Theory and Methods, vol. 29, no. 5-6, pp. 1257–1271, 2000.[37] N. E. Breslow and D. G. Clayton, “Approximate inference in generalized

linear mixed models,” Journal of the American statistical Association,vol. 88, no. 421, pp. 9–25, 1993.

[38] S. N. Wood, “Fast stable restricted maximum likelihood and marginallikelihood estimation of semiparametric generalized linear models,”Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy), vol. 73, no. 1, pp. 3–36, 2011.

[39] Y. Bengio, “Learning deep architectures for AI,” Foundations andtrends R© in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.

[40] L. Deng and D. Yu, “Deep learning: methods and applications,” Founda-tions and Trends R© in Signal Processing, vol. 7, no. 3–4, pp. 197–387,2014.

[41] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in Advances in neural infor-mation processing systems, 2012, pp. 1097–1105.

[42] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition withdeep recurrent neural networks,” in Acoustics, speech and signal pro-cessing (icassp), 2013 ieee international conference on. IEEE, 2013,pp. 6645–6649.

[43] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm fordeep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554,2006.

[44] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithmfor Boltzmann machines,” Cognitive science, vol. 9, no. 1, pp. 147–169,1985.

[45] H. Larochelle and Y. Bengio, “Classification using discriminative re-stricted Boltzmann machines,” in Proceedings of the 25th internationalconference on Machine learning. ACM, 2008, pp. 536–543.

[46] M. Langkvist, L. Karlsson, and A. Loutfi, “A review of unsupervisedfeature learning and deep learning for time-series modeling,” PatternRecognition Letters, vol. 42, pp. 11–24, 2014.

[47] G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neuralnetworks: The state of the art,” International journal of forecasting,vol. 14, no. 1, pp. 35–62, 1998.

[48] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-dinov, “Dropout: A simple way to prevent neural networks from over-fitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp.1929–1958, 2014.

[49] A. Y. Ng, “Feature selection, l 1 vs. l 2 regularization, and rotationalinvariance,” in Proceedings of the twenty-first international conferenceon Machine learning. ACM, 2004, p. 78.

[50] R. J. Hyndman and A. B. Koehler, “Another look at measures of forecastaccuracy,” International journal of forecasting, vol. 22, no. 4, 2005.

[51] A. J. Fox, “Outliers in time series,” Journal of the Royal StatisticalSociety. Series B (Methodological), pp. 350–363, 1972.

[52] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders withnonlinear dimensionality reduction,” in Proceedings of the MLSDA 20142nd Workshop on Machine Learning for Sensory Data Analysis. ACM,2014, p. 4.

[53] TimeAndDate.com. (2010) Japan National Holidays history. [Online].Available: https://www.timeanddate.com/holidays/japan/

[54] (1995) Tokyo Daily Temperature history. [Online]. Available: https://www.wunderground.com/

[55] G. E. Box and D. R. Cox, “An analysis of transformations,” Journal ofthe Royal Statistical Society. Series B (Methodological), pp. 211–252,1964.

[56] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”in Proceedings of the 22Nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. ACM, 2016, pp. 785–794.

[57] P. H. Eilers and B. D. Marx, “Flexible smoothing with B-splines andpenalties,” Statistical science, pp. 89–102, 1996.

[58] S. N. Wood, Generalized additive models: an introduction with R. CRCpress, 2017.

[59] ——, “Thin plate regression splines,” Journal of the Royal StatisticalSociety: Series B (Statistical Methodology), vol. 65, no. 1, pp. 95–114,2003.

[60] L. Prechelt, “Early stopping-but when?” in Neural Networks: Tricks ofthe trade. Springer, 1998, pp. 55–69.

[61] M. Gilliland, U. Sglavo, and L. Tashman, Business Forecasting: Prac-tical Problems and Solutions. John Wiley & Sons, 2016.

[62] S. Makridakis and M. Hibon, “The M3-Competition: results, conclusionsand implications,” International journal of forecasting, vol. 16, no. 4,pp. 451–476, 2000.

[63] J. Julkunen, “Feature Spotlight: In-Game Events and MarketTrends,” 2016. [Online]. Available: http://www.gamerefinery.com/in-game-events-market-trends/

[64] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neuralcomputation, vol. 9, no. 8, pp. 1735–1780, 1997.

[65] L. Dwyer, A. Gill, and N. Seetaram, Handbook of research methodsin tourism: Quantitative and qualitative approaches. Edward ElgarPublishing, 2012.

[66] R. Hyndman and Y. Khandakar, “Automatic time series forecasting: theforecast package for R,” 2008.

[67] S. Wood, “mgcv: Mixed GAM computation vehicle withGCV/AIC/REML smoothness estimation,” 2012.

[68] Theano Development Team, “Theano: A Python framework forfast computation of mathematical expressions,” arXiv e-prints, vol.abs/1605.02688, May 2016. [Online]. Available: http://arxiv.org/abs/1605.02688