Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

  • View
    514

  • Download
    2

  • Category

    Design

Preview:

Citation preview

Player Rating Systems for Balancing Human

Computation Games testing the effect of bipartiteness

Seth Cooper, Sebastian Deterding, Theo Tsapakos DiGRA 2016, August 6, 2016

cb

<1> the challenge

»flow«

Diff

icul

ty

Skill/time

frustration

boredom

flow (1990)Mihaly Csikszentmihalyi

winning odds correlate w/ retentionLomas et al., 2013

human computation games

Diff

icul

ty

Skill/time

1. scientific tasks are predetermined

the

problem

Diff

icul

ty

Skill/time

2. tasks can’t be changed

Diff

icul

ty

Skill/time

3. Difficulty is unknown in advance

?

?

? ?

?

??

??

?

? ?

?

?

??

?

Diff

icul

ty

Skill/time

4. solving tasks defeats crowdsourcing

!

!

! !

!

!!

!!

!

! !

!

!

!!

!

Diff

icul

ty

Skill/time

?

?

? ?

?

??

??

?

? ?

?

?

??

?

… hence tasks are served randomlyLintott, 2016

hence retention is very poorSauermann & Franzoni, 2015

% P

laye

r re

tain

ed

Time/levels

most leave after balanced tutorials* idealised

tutorial

actual tasks*

Diff

icul

ty

Skill/time

How to sequence tasks w/o solving?

?

?

?

?

?

??

?

the

challenge

?

?

?

user-generated content

also

applies to

crowdsourcing

also

applies to

<2> the approach

multiplayer matchmaking

elo, 1978 glicko-2, 2012/3 trueskill, 2006

uses player rating algorithms

skill = winning odds, updated w/ each gameMoser, 2010

remember: winning odds > retentionLomas et al., 2013

widely used, effective predictionMenke, 2016

our approach: tasks = players

Player rating = skill

Task rating = difficulty

Player rating = skill

<3> the question

we produce a bipartite graphAsratian et al., 1998

we produce a bipartite graphAsratian et al., 1998

Play

ers

Task

s

Play

ers

Task

s

less density, less information flowScott, 2012

more structural holesScott, 2012

Play

ers

Task

s

more unbalanced graphsScott, 2012

Play

ers

Task

s

Research question does a bipartite (player-player or user-task) graph negatively affect the prediction accuracy of player rating algorithms? does graph balancedness affect accurcay?

<4> the study

predicting chess matches with elo

data

set 1

bipartite training data has no effect

unbalanced bipartite graphs perform better

unbalanced bipartite graphs have super vertices

elo, glicko2, Truskill on paradox game

data

set 2

all rating systems outperform baseline

<5> discussion & outlook

main contributions

• Identified 4 challenges to difficulty balancing in human computation games, crowdsourcing, UGC

• Introduced content sequencing through adapting player rating algorithms as a novel approach

• Identified bipartiteness of user-task graph as potential issue

• Found that bipartiteness does not affect prediction accuracy of ELO, Glicko-2, Truskill in Chess matches or human computation game Paradox

• Found that unbalanced graphs improve prediction accuracy, presumably due to super vertices/players

• Provided first support that our approach is viable

limitations & future work I

• Approach requires previous/initial data • Use super-users to provide initial data

• Use “calibration” tasks in tutorials

• Use mixed method data to identify skill & difficulty indicators, data & machine learning to validate & extract additional indicators

• Current algorithms only compute win/loss/draw • Graded success measures could improve accuracy and learning speed

• Study trained on large data sets (10,000, 37 edges) • Testing learning speed of algorithms w/ current default retention in human

computation games

• Study tested only one human computation game • Replication with multiple games

limitations & future work II

• Study didn’t test direct effect on retention • Follow-up user study

• Task pool might not contain tasks of best-fitting difficulty (similar to empty bar in mulitplayer games) • Procedural content generation to generate training/filler tasks

• Many human computation tasks don’t vary much in difficulty • Expand matching approach to other factors like curiosity/variety

Recommended