GVGAI Single-Player Learning Competition at IEEE CIG17

General Video Game AISingle-Player Learning Competition

Jialin Liu1, Diego Perez-Liebana2, Simon M. Lucas1

1Queen Mary University of London, UK

2University of Essex, UK

August 24, 2017

LIU, Jialin (QMUL) GVGAI Single-Player Learning Competition August 24, 2017 1 / 19

Why General Video Game AI competitions?Two research questions:

Can an AI program solve a particular difficult game?

Can an AI program solve a large set of different games?


General Video Game AI framework

Implemented by University of Essex (UK)1 + New York University

Java + Video Game Definition Language (+ Python)

Figure 1: GVGAI: http://www.gvgai.net/

1We have moved to Queen Mary University of London :)LIU, Jialin (QMUL) GVGAI Single-Player Learning Competition August 24, 2017 3 / 19

http://www.gvgai.net/

General Video Game AI framework

Implemented by University of Essex (UK)2 + New York University

Java + Video Game Definition Language (+ Python)

Used for research, education and competitionsI GVGAI in research

F General (Video) Game PlayingF Automatic game design

? Tutorial on 22nd August, slides will be available online soon

I GVGAI in educationF University of Essex, UKF New York University, USAF Universidad de Malaga, SpainF Nanjing University, ChinaF ...


General Video Game AI frameworkImplemented by University of Essex (UK)3 + New York UniversityJava + Video Game Definition Language (+ Python)Used for research, education and competitions

Track Year Conference

Single-Player Planning

2014 CIG2015 CEEC, CIG, GECCO2016 CIG, GECCO2017 GECCO

Two-Player Planning2016 WCCI, CIG2017 CEC

Single-Player Learning 2017 CIG

Level generation2016 CIG2017 CIG

Rule generation 2017 CIG

Table 1: Past competitions.


GVGAI: Learning Track

Single-player learning trackI Given a set of unknown gamesI 40ms to decide an action, play till 2000 game ticks or win/lossI Only observation accessible, no forward model provided→ No game simulation :(

I For each of the gameF Levels 1, 2 and 3 for training (5 mins in total)F Levels 4 and 5 for validation


GVGAI: Learning Track

Single-player learning trackI Given a set of unknown gamesI 40ms to decide an action, play till 2000 game ticks or win/lossI Only observation accessible, no forward model provided→ No game simulation :(

I For each of the gameF Levels 1, 2 and 3 for training (5 mins in total)F Levels 4 and 5 for validation


How different from the Planning Tracks?

Planning tracksSingle-Player Learning

Single-Player Two-Player

Similarities

• Play unseen games, no game rules available :)• Access to game score, tick, if terminated• Access to legal actions• Access to observation of current game state

Forward model? Yes NoHistory events? Yes No

State Observation?Yes Serialised SO

Java Java &Python

Serialised StateObservation in 2 formats:

JSON

Screenshot of the game screen (PNG) (slightly unfair)

... or both above.


How different from the Planning Tracks?

Planning tracksSingle-Player Learning

Single-Player Two-Player

Similarities

• Play unseen games, no game rules available :)• Access to game score, tick, if terminated• Access to legal actions• Access to observation of current game state

Forward model? Yes NoHistory events? Yes No

State Observation?Yes Serialised SO

Java Java &Python

Serialised StateObservation in 2 formats:

JSON

Screenshot of the game screen (PNG) (slightly unfair)

... or both above.LIU, Jialin (QMUL) GVGAI Single-Player Learning Competition August 24, 2017 7 / 19

What the agent needs to inherit?public class Agent extends utils.AbstractPlayer {

// lastSsoType can be reset at anytime

public Types.LEARNING_SSO_TYPE lastSsoType

= Types.LEARNING_SSO_TYPE.JSON;

/**

* Constructor

* To be called at the start of the communication.

* No game has been initialized yet.

* Perform one -time setup here.

*/

public Agent (){

......

}

}

Constructor: Called only once, before starting learning (≤1s).

Types.LEARNING SSO TYPE lastSsoType sets the format ofserialised StateObservation to receive.


What the agent needs to inherit?

public class Agent extends utils.AbstractPlayer {

......

/**

* Public method to be called at the start of every level of a game.

* Perform any level -entry initialization here.

* @param sso Phase Observation of the current game.

* @param elapsedTimer Timer (1s)

*/

@Override

public void init(SerializableStateObservation sso ,

ElapsedCpuTimer elapsedTimer) {

......

}

......

}

init(...): Called at the start of every level of a game (≤1s).


What the agent needs to inherit?public class Agent extends utils.AbstractPlayer {

......

/**

* Method used to determine the next move to be performed by the agent.

*

* @param sso Observation of the current state of the game

* @param elapsedTimer Timer (40ms)

* @return The action to be performed by the agent.

*/

@Override

public Types.ACTIONS act(SerializableStateObservation sso ,

ElapsedCpuTimer elapsedTimer ){

......

}

......

}

act(...): Select an action at every game tick (≤40ms).The agent can ABORT the current game at anytime.LIU, Jialin (QMUL) GVGAI Single-Player Learning Competition August 24, 2017 10 / 19

What the agent needs to inherit?result

public class Agent extends utils.AbstractPlayer {

......

/**

* Method used to perform actions in case of a game end.

* This is the last thing called when a level is played.

*

* @param sso The current state observation of the game.

* @param elapsedTimer Timer (5min + 1s)

* @return The next level of the current game to be played.

*/

@Override

public int result(SerializableStateObservation sso ,

ElapsedCpuTimer elapsedTimer ){

......

}

......

}

result(...): Called at the end of every level (≤1s).LIU, Jialin (QMUL) GVGAI Single-Player Learning Competition August 24, 2017 11 / 19

Evaluation of agentsVery similar to Single-Player Planning track :)

Evaluate in 10 games, 2 levels per game (4 and 5)Results considered in evaluation

I Number of victoriesI Average scoreI Time spent

Entries ranked and awarded with points: 25, 18, 15, 10, 8, 6, 4, 2, 1Final ranknings by adding all points across all games


Summary of submissionsExternal:

I 1 Java agent: YOLOBOTF Tobias Joppen & Nils Schroeder & Miriam Moneke, Technischen

Universitat Darmstadt, GermanyF Similar to Planning agent described http:

//ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7970136F Replace MCTS by greedy to pick the action that minimises the

distance to the chosen object at mostI 1 Python agent: ercumentilhan

F Ercument Ilhan, Istanbul Technical University, TurkeyF Double Expected True Online SARSA Lambda with linear function

approximation and softmax policy

Internal:I 2 Java agents by Kamolwan Kunanusont, University of Essex, UKI DontUnderestimateUchiha: eGreedyI kkunan: QLearning using avatar information as features

Sample agentsI sampleRandomI sampleLearner in Java: SARSA Lambda with mesh feature

representation


http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7970136

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7970136

Ranking of agentsTraining set 1:

Test set (final ranking):

Agent G2 G3 G4 G5 G6 G7 G8 G9 G10 Totalkkunan 18 25 25 25 8 25 25 18 15 184

sampleRandom 25 25 15 25 12 18 15 25 18 178DontUnderestimateUchiha 12 25 18 25 25 10 18 15 10 158

sampleLearner 15 25 12 25 15 15 12 8 25 152ercumentilhan 12 25 10 25 18 12 10 10 12 134YOLOBOT 8 25 8 25 10 8 8 12 8 112

Remark: The G1 is removed from the final ranking due to bugs in thegame itself.


Ranking of agentsTraining set 1:

Test set (final ranking):

Agent G2 G3 G4 G5 G6 G7 G8 G9 G10 Totalkkunan 18 25 25 25 8 25 25 18 15 184

sampleRandom 25 25 15 25 12 18 15 25 18 178DontUnderestimateUchiha 12 25 18 25 25 10 18 15 10 158

sampleLearner 15 25 12 25 15 15 12 8 25 152ercumentilhan 12 25 10 25 18 12 10 10 12 134YOLOBOT 8 25 8 25 10 8 8 12 8 112

Remark: The G1 is removed from the final ranking due to bugs in thegame itself.


Ranking of agents

If considering both training and test sets:

Rank Agent Points R. Training Set R. Test Set1 sampleRandom 332 2 22 ercumentilhan 313 1 53 kkunan 309 6 14 DontUnderestimateUchiha 307 3 35 sampleLearner 301 4 46 YOLOBOT 244 5 6

sampleRandom is the big winner...


Best scores by planning and learning agents on test set

GameSingle-Player Planning Single-Player Learning

Best score Best score AgentG2 109.00± 38.19 31.5 ± 14.65 sampleRandomG3 1.00± 0.00 0±0 *G4 1.00± 0.00 0.2 ± 0.09 kkunanG5 216.00± 24.00 1±0 *G6 5.60± 0.78 3.45± 0.44 DontUnderestimateUchihaG7 31696.10± 6975.78 29371.95±2296.91 kkunanG8 1116.90± 660.84 35.15±8.48 kkunanG9 1.00± 0.00 0.05± 0.05 sampleRandomG10 56.70± 25.23 2.75 ± 2.04 sampleLearner


Next Competition

CIG2018I Need to submit a proposal...

To be improved:I Record the game playing (during validation)I Better bug shooting and error report to participantsI Better timerI Provide more and stronger sample agentsI Advertise the track earlierI ......


Acknowledgement

Daniel-Valentin IonitaSecond-year studentUniversity of Essex, UK

Thanks to all the participants:Kamolwan Kunanusont, Ercument Ilhan, Tobias Joppen, NilsSchroeder and Miriam Moneke.

Special thanks to Damien Anderson and Qi Zhang for testing theframework on windows machines.

Useful links

GVGAI website: http://www.gvgai.net/

Framework: https://github.com/EssexUniversityMCTS/gvgai.git


http://www.gvgai.net/

https://github.com/EssexUniversityMCTS/gvgai.git

Thank you!!! Questions?

Prizes sponsored by IEEE-CIS

3 prizes (500USD/300USD/200USD)

Winners of game competitions organised in IEEE CIS conferences

Students or young professionals

Detailed awarding policy can be found in http://cis.ieee.org/

student-games-based-competition/awarding-policy.html.


http://cis.ieee.org/student-games-based-competition/awarding-policy.html

http://cis.ieee.org/student-games-based-competition/awarding-policy.html

Presentations & Public Speaking

GVGAI Single-Player Learning Competition at IEEE CIG17