View
271
Download
1
Embed Size (px)
Citation preview
Applying Machine
Learning to Mobile Games
Neil Patrick Del Gallego
An example of supervised learning for predictive game analytics
Who Am I
Software Engineer and Game Developer for 5 years.Developed Dragon Cubes along with other team membersDeveloped Bubble CubesCurrently developing Casino Slots
Who Am I
Also currently taking Masters of Science in Computer Science in DLSU.
What is machine learning?
Machine Learning
Field of study that gives the computer the ability to learn and recognize patterns.
Image R G B Name
A 255 255 255 Animal
B 255 0 0 Animal
C 132 122 230 Plant
D 89 134 200 Plant
Training Dataset Machine Learning
Algorithm Prediction Model/s
Animal? Plant?
Supervised Learning
The class to predict is explicitly given in the dataset
TRAINING
DATASET
Image R G B Name
A 255 255 255 Animal
B 255 0 0 Animal
C 132 122 230 Plant
D 89 134 200 Plant
Supervised Learning
Prediction model attempts to predict the missing class value
TRAINING
DATASET
NEW/UNSEEN
DATASET
Image R G B Name
A 255 255 255 Animal
B 255 0 0 Animal
C 132 122 230 Plant
D 89 134 200 Plant
Image R G B Name
E 0 0 255 ??
F 255 0 0 ??
G 0 122 125 ??
H 98 7 2 ??
Data Mining
• Extract patterns or provide insights from a complex dataset.
• Borrows techniques from machine learning and statistics.
Data Mining
• Patterns should be:• Non-trivial (not normally extracted in an SQL
statement)• Unknown• Unexpected• Potentially useful• Actionable
Why Data Mining?
Drowning in data but starved for knowledge
Unstructured data, or knowledge is deeply buried.
Predicting Daily Active Users for Match-3 Mobile Games
Application of supervised learning for Dragon Cubes and Jungle Cubes
It’s been published!
Daily Active Users
A measure of application virality. A very important metric to gauge the success of an application.
Motivation
Attempt to determine the amount of user activity X days ahead to assist on project planning.
X = 7, on our study
Marketing expenses
Daily active users
Develop features to make users stay
Where data was extracted
MARKETING DATA
Dataset Overview
• Two match-3 games
• JNC generating revenue
• DNC does not
• Both games only differ in game mechanics.
General Methodology of Data Mining*
*How I performed data mining on our data*For supervised learning. Unsupervised learning use subjective evaluation to determine reliability of model.
Dataset
Feature Selection
Refined
Dataset
Machine Learning
Prediction Models
Model A Model B
Model C Model D
Unseen
Data
Predicted
Result
Accuracy
Measure
Features in Dataset
Users are reached via advertising channels
MKTExpenses This is the total amount of marketing expenses, in USD, spent to advertise the game.
A high marketing expense means more advertising channels have been used to target more potential users to install the game.
Advertising amount per user per country*
● Users are reached via advertising channels.
*Market insight taken from Chartboost: http://tinyurl.com/charboost
Features in Dataset
How many users discovered our app on a given date?
Install Date Calendar date of installation.
Cohort Size Refers to the total amount of users who have installed the application on the given install date.
Session Count Refers to the total amount of play sessions on a given install date.
Features in Dataset
How long do players play the game?
AvgSessionSeconds The arithmetic mean of the total amount of time users spend in the game
MedianSessionSeconds Session length value where half of the sessions are longer, and half are shorter.
Features in Dataset
Market impression of the game
DailyAverageRating
Features in Dataset
Market impression of the game
CrashesANRDay1
Features in Dataset
How many users are engaged?
ActiveUsers This refers to the total amount of unique users who spent considerable time in the game given a certain date.
ActiveUsersDay7 This is similar to the ActiveUsers variable but offset 7 days after the install date. This is the variable to be predicted.
Features in Dataset
Screen that triggers the events
LevelPlayedEvents
LevelSuccessEvents
LevelFailedEvents
Dataset
Applying Methodology
Dataset
Feature Selection
Refined
Dataset
Machine Learning
Prediction Models
Model A Model B
Model C Model D
Unseen
Data
Predicted
Result
Accuracy
Measure
Feature Selection
Filter out unneeded attributesDetermine what attributes matters
Correlation Analysis
Measure relationship of two variables.As X grows, how fast Y grows/declines? Range (-1.0 to +1.0)
0.0 means no relationship at all.+1.0 strong positive relationship-1.0 strong negative relationship
Correlation Analysis
0.7 as our threshold for strong relationship.
Basis for manual feature selection.
Automatic Feature Selection
Automatic Feature Selection
CSV file Filtered
CSVAcceptable?
Manual feature
selection
NO
YES Final CSV for training
*Wrappers for feature subset selection by Ron Kohavi a,, George H. John b, (1995)
Wrapper scheme* algorithm is used for automatic feature selection.
Automatic Feature Selection
Automatic Feature Selection
CSV file Filtered
CSVAcceptable?
Manual feature
selection
NO
YES Final CSV for training
● Despite using the wrapper scheme, some selected features are considered noise.
● Evaluate if the selected features are indeed valuable.
Applying Methodology
Dataset
Feature Selection
Refined
Dataset
Machine Learning
Prediction Models
Model A Model B
Model C Model D
Unseen
Data
Predicted
Result
Accuracy
Measure
Selected Attributes
Applying Methodology
Dataset
Feature Selection
Refined
Dataset
Machine Learning
Prediction Models
Model A Model B
Model C Model D
Unseen
Data
Predicted
Result
Accuracy
Measure
Applying Machine Learning
Using M5Base or decision tree with regression
Machine Learning Technique
Using M5Base (decision tree with regression)WEKA demo (if possible)
M5Base sample
Problem!
We do not have enough unseen data yet!What do we do to test our model?
K-fold cross-validation
Divide dataset into K partitions (recommended 10)
Test
setTraining set
K-fold cross-validation
Divide dataset into K partitions (recommended 10)
Test
setTraining set
Traini
ng
set
K-fold cross-validation
Divide dataset into K partitions (recommended 10)
Test
setTraining setTraining set
K-fold cross-validation
Divide dataset into K partitions (recommended 10)
Test
set Training setTraining set
Applying Methodology
Dataset
Feature Selection
Refined
Dataset
Machine Learning
Prediction Models
Model A Model B
Model C Model D
Unseen
Data
Predicted
Result
Accuracy
Measure
Unseen Data Sample
Applying Methodology
Dataset
Feature Selection
Refined
Dataset
Machine Learning
Prediction Models
Model A Model B
Model C Model D
Unseen
Data
Predicted
Result
Accuracy
Measure
Accuracy Measure
A value close to 0.0 is not really significant.
Accuracy Measure
Magnitude of error of predicted value vs actual value. Lower is better.
Accuracy Measure
Percentage of error of predicted value vs actual value. Lower is better.
InterpretationsUsing results from M5Base.
*More details available at: http://www.dlsu.edu.ph/conferences/dlsu-research-congress-proceedings/2016/GRC/GRC-HCT-I-001.pdf
Interpretations
M5Base performed exceptionally well on JNC-Test (unseen data), which makes it recommendable for real-world use.
Interpretations
Jungle Cubes is potentially, a predictable scaleable game.
Interpretations
JNC and DNC have almost the same total advertising expense. But DNC did not gain enough daily active users.
Positive correlation summary JNC DAU-Day7 DNC DAU-Day7
MKTExpenses High Low
SessionCount High Low
SessionLength High Low
Interpretations
Based from our study, we propose a finding that MKTExpenses gets high correlation with DAU-Day7 once the game has enough enjoyable content to keep users engaged.
Ensure high session length
Promote replayability
SessionLengthaffects
SessionCountaffects
Satisfies business requirements?
Increase advertising campaigns
MKTExpenses
affects
DAU
DAU-Day7
influences
Conclusion
Did our study correctly predicted the fate of JNC and DNC?
Yes. Jungle Cubes has gained over 1M downloads as of October 2016 and is still profitable.
Dragon Cubes was pulled out of the market last September 2016.
We thank you Jakob Lykkegaard Pedersen and Thomas Andreasen for allowing us to use the dataset for Jungle Cubes and Dragon Cubes. We would also like to give thanks to Suhana Chooli, the marketing manager of Playlab Inc., which provided the details about the marketing expenses.
Thank you for listening!