Upload
sebastian-deterding
View
514
Download
2
Embed Size (px)
Citation preview
Player Rating Systems for Balancing Human
Computation Games testing the effect of bipartiteness
Seth Cooper, Sebastian Deterding, Theo Tsapakos DiGRA 2016, August 6, 2016
cb
<1> the challenge
»flow«
Diff
icul
ty
Skill/time
frustration
boredom
flow (1990)Mihaly Csikszentmihalyi
winning odds correlate w/ retentionLomas et al., 2013
human computation games
Diff
icul
ty
Skill/time
1. scientific tasks are predetermined
the
problem
Diff
icul
ty
Skill/time
2. tasks can’t be changed
Diff
icul
ty
Skill/time
3. Difficulty is unknown in advance
?
?
? ?
?
??
??
?
? ?
?
?
??
?
Diff
icul
ty
Skill/time
4. solving tasks defeats crowdsourcing
!
!
! !
!
!!
!!
!
! !
!
!
!!
!
Diff
icul
ty
Skill/time
?
?
? ?
?
??
??
?
? ?
?
?
??
?
… hence tasks are served randomlyLintott, 2016
hence retention is very poorSauermann & Franzoni, 2015
% P
laye
r re
tain
ed
Time/levels
most leave after balanced tutorials* idealised
tutorial
actual tasks*
Diff
icul
ty
Skill/time
How to sequence tasks w/o solving?
?
?
?
?
?
??
?
the
challenge
?
?
?
user-generated content
also
applies to
crowdsourcing
also
applies to
<2> the approach
multiplayer matchmaking
elo, 1978 glicko-2, 2012/3 trueskill, 2006
uses player rating algorithms
skill = winning odds, updated w/ each gameMoser, 2010
remember: winning odds > retentionLomas et al., 2013
widely used, effective predictionMenke, 2016
our approach: tasks = players
Player rating = skill
Task rating = difficulty
Player rating = skill
<3> the question
we produce a bipartite graphAsratian et al., 1998
we produce a bipartite graphAsratian et al., 1998
Play
ers
Task
s
Play
ers
Task
s
less density, less information flowScott, 2012
more structural holesScott, 2012
Play
ers
Task
s
more unbalanced graphsScott, 2012
Play
ers
Task
s
Research question does a bipartite (player-player or user-task) graph negatively affect the prediction accuracy of player rating algorithms? does graph balancedness affect accurcay?
<4> the study
predicting chess matches with elo
data
set 1
bipartite training data has no effect
unbalanced bipartite graphs perform better
unbalanced bipartite graphs have super vertices
elo, glicko2, Truskill on paradox game
data
set 2
all rating systems outperform baseline
<5> discussion & outlook
main contributions
• Identified 4 challenges to difficulty balancing in human computation games, crowdsourcing, UGC
• Introduced content sequencing through adapting player rating algorithms as a novel approach
• Identified bipartiteness of user-task graph as potential issue
• Found that bipartiteness does not affect prediction accuracy of ELO, Glicko-2, Truskill in Chess matches or human computation game Paradox
• Found that unbalanced graphs improve prediction accuracy, presumably due to super vertices/players
• Provided first support that our approach is viable
limitations & future work I
• Approach requires previous/initial data • Use super-users to provide initial data
• Use “calibration” tasks in tutorials
• Use mixed method data to identify skill & difficulty indicators, data & machine learning to validate & extract additional indicators
• Current algorithms only compute win/loss/draw • Graded success measures could improve accuracy and learning speed
• Study trained on large data sets (10,000, 37 edges) • Testing learning speed of algorithms w/ current default retention in human
computation games
• Study tested only one human computation game • Replication with multiple games
limitations & future work II
• Study didn’t test direct effect on retention • Follow-up user study
• Task pool might not contain tasks of best-fitting difficulty (similar to empty bar in mulitplayer games) • Procedural content generation to generate training/filler tasks
• Many human computation tasks don’t vary much in difficulty • Expand matching approach to other factors like curiosity/variety