Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego

Applying Machine

Learning to Mobile Games

Neil Patrick Del Gallego

An example of supervised learning for predictive game analytics

Who Am I

Software Engineer and Game Developer for 5 years.Developed Dragon Cubes along with other team membersDeveloped Bubble CubesCurrently developing Casino Slots

Who Am I

Also currently taking Masters of Science in Computer Science in DLSU.

What is machine learning?

Machine Learning

Field of study that gives the computer the ability to learn and recognize patterns.

Image R G B Name

A 255 255 255 Animal

B 255 0 0 Animal

C 132 122 230 Plant

D 89 134 200 Plant

Training Dataset Machine Learning

Algorithm Prediction Model/s

Animal? Plant?

Supervised Learning

The class to predict is explicitly given in the dataset

TRAINING

DATASET

Image R G B Name

A 255 255 255 Animal

B 255 0 0 Animal

C 132 122 230 Plant

D 89 134 200 Plant

Supervised Learning

Prediction model attempts to predict the missing class value

TRAINING

DATASET

NEW/UNSEEN

DATASET

Image R G B Name

A 255 255 255 Animal

B 255 0 0 Animal

C 132 122 230 Plant

D 89 134 200 Plant

Image R G B Name

E 0 0 255 ??

F 255 0 0 ??

G 0 122 125 ??

H 98 7 2 ??

Data Mining

• Extract patterns or provide insights from a complex dataset.

• Borrows techniques from machine learning and statistics.

Data Mining

• Patterns should be:• Non-trivial (not normally extracted in an SQL

statement)• Unknown• Unexpected• Potentially useful• Actionable

Why Data Mining?

Drowning in data but starved for knowledge

Unstructured data, or knowledge is deeply buried.

Predicting Daily Active Users for Match-3 Mobile Games

Application of supervised learning for Dragon Cubes and Jungle Cubes

It’s been published!

Daily Active Users

A measure of application virality. A very important metric to gauge the success of an application.

Motivation

Attempt to determine the amount of user activity X days ahead to assist on project planning.

X = 7, on our study

Marketing expenses

Daily active users

Develop features to make users stay

Where data was extracted

MARKETING DATA

Dataset Overview

• Two match-3 games

• JNC generating revenue

• DNC does not

• Both games only differ in game mechanics.

General Methodology of Data Mining*

*How I performed data mining on our data*For supervised learning. Unsupervised learning use subjective evaluation to determine reliability of model.

Dataset

Feature Selection

Refined

Dataset

Machine Learning

Prediction Models

Model A Model B

Model C Model D

Unseen

Data

Predicted

Result

Accuracy

Measure

Features in Dataset

Users are reached via advertising channels

MKTExpenses This is the total amount of marketing expenses, in USD, spent to advertise the game.

A high marketing expense means more advertising channels have been used to target more potential users to install the game.

Advertising amount per user per country*

● Users are reached via advertising channels.

*Market insight taken from Chartboost: http://tinyurl.com/charboost

Features in Dataset

How many users discovered our app on a given date?

Install Date Calendar date of installation.

Cohort Size Refers to the total amount of users who have installed the application on the given install date.

Session Count Refers to the total amount of play sessions on a given install date.

Features in Dataset

How long do players play the game?

AvgSessionSeconds The arithmetic mean of the total amount of time users spend in the game

MedianSessionSeconds Session length value where half of the sessions are longer, and half are shorter.

Features in Dataset

Market impression of the game

DailyAverageRating

Features in Dataset

Market impression of the game

CrashesANRDay1

Features in Dataset

How many users are engaged?

ActiveUsers This refers to the total amount of unique users who spent considerable time in the game given a certain date.

ActiveUsersDay7 This is similar to the ActiveUsers variable but offset 7 days after the install date. This is the variable to be predicted.

Features in Dataset

Screen that triggers the events

LevelPlayedEvents

LevelSuccessEvents

LevelFailedEvents

Dataset

Applying Methodology

Dataset

Feature Selection

Refined

Dataset

Machine Learning

Prediction Models

Model A Model B

Model C Model D

Unseen

Data

Predicted

Result

Accuracy

Measure

Feature Selection

Filter out unneeded attributesDetermine what attributes matters

Correlation Analysis

Measure relationship of two variables.As X grows, how fast Y grows/declines? Range (-1.0 to +1.0)

0.0 means no relationship at all.+1.0 strong positive relationship-1.0 strong negative relationship

Correlation Analysis

0.7 as our threshold for strong relationship.

Basis for manual feature selection.

Automatic Feature Selection


CSV file Filtered

CSVAcceptable?

Manual feature

selection

NO

YES Final CSV for training

*Wrappers for feature subset selection by Ron Kohavi a,, George H. John b, (1995)

Wrapper scheme* algorithm is used for automatic feature selection.



CSV file Filtered

CSVAcceptable?

Manual feature

selection

NO

YES Final CSV for training

● Despite using the wrapper scheme, some selected features are considered noise.

● Evaluate if the selected features are indeed valuable.


Dataset

Feature Selection

Refined

Dataset

Machine Learning

Prediction Models

Model A Model B

Model C Model D

Unseen

Data

Predicted

Result

Accuracy

Measure

Selected Attributes


Dataset

Feature Selection

Refined

Dataset

Machine Learning

Prediction Models

Model A Model B

Model C Model D

Unseen

Data

Predicted

Result

Accuracy

Measure

Applying Machine Learning

Using M5Base or decision tree with regression

Machine Learning Technique

Using M5Base (decision tree with regression)WEKA demo (if possible)

M5Base sample

Problem!

We do not have enough unseen data yet!What do we do to test our model?

K-fold cross-validation

Divide dataset into K partitions (recommended 10)

Test

setTraining set



Test

setTraining set

Traini

ng

set



Test

setTraining setTraining set



Test

set Training setTraining set


Dataset

Feature Selection

Refined

Dataset

Machine Learning

Prediction Models

Model A Model B

Model C Model D

Unseen

Data

Predicted

Result

Accuracy

Measure

Unseen Data Sample


Dataset

Feature Selection

Refined

Dataset

Machine Learning

Prediction Models

Model A Model B

Model C Model D

Unseen

Data

Predicted

Result

Accuracy

Measure

Accuracy Measure

A value close to 0.0 is not really significant.

Accuracy Measure

Magnitude of error of predicted value vs actual value. Lower is better.

Accuracy Measure

Percentage of error of predicted value vs actual value. Lower is better.

InterpretationsUsing results from M5Base.

*More details available at: http://www.dlsu.edu.ph/conferences/dlsu-research-congress-proceedings/2016/GRC/GRC-HCT-I-001.pdf

http://www.dlsu.edu.ph/conferences/dlsu-research-congress-proceedings/2016/GRC/GRC-HCT-I-001.pdf

Interpretations

M5Base performed exceptionally well on JNC-Test (unseen data), which makes it recommendable for real-world use.

Interpretations

Jungle Cubes is potentially, a predictable scaleable game.

Interpretations

JNC and DNC have almost the same total advertising expense. But DNC did not gain enough daily active users.

Positive correlation summary JNC DAU-Day7 DNC DAU-Day7

MKTExpenses High Low

SessionCount High Low

SessionLength High Low

Interpretations

Based from our study, we propose a finding that MKTExpenses gets high correlation with DAU-Day7 once the game has enough enjoyable content to keep users engaged.

Ensure high session length

Promote replayability

SessionLengthaffects

SessionCountaffects

Satisfies business requirements?

Increase advertising campaigns

MKTExpenses

affects

DAU

DAU-Day7

influences

Conclusion

Did our study correctly predicted the fate of JNC and DNC?

Yes. Jungle Cubes has gained over 1M downloads as of October 2016 and is still profitable.

Dragon Cubes was pulled out of the market last September 2016.

We thank you Jakob Lykkegaard Pedersen and Thomas Andreasen for allowing us to use the dataset for Jungle Cubes and Dragon Cubes. We would also like to give thanks to Suhana Chooli, the marketing manager of Playlab Inc., which provided the details about the marketing expenses.

Thank you for listening!

Technology

Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego