Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex,...

Machine Learning and Games

Simon M. LucasCentre for Computational Intelligence

University of Essex, UK

Overview• Games: dynamic, uncertain, open-ended

– Ready-made test environments– 21 billion dollar industry: space for more machine learning…

• Agent architectures– Where the Computational Intelligence fits– Interfacing the Neural Nets etc– Choice of learning machine (WPC, neural network,

NTuple systems)• Training algorithms

– Evolution / co-evolution– TDL– Hybrids

• Methodology: strong belief in open competitions

My Angle

• Machine learning– How well can systems learn– Given complex semi-structured environment– With indirect reward schemes

Sample Games

• Car Racing• Othello• Ms Pac-Man– Demo

Agent Basics

• Two main approaches– Action selector– State evaluator

• Each of these has strengths and weaknesses• For any given problem, no hard and fast rules– Experiment!

• Success or failure can hinge on small details!

Co-evolutionEvolutionary algorithm: rank them using a league

(Co) Evolution v. TDL

• Temporal Difference Learning– Often learns much faster– But less robust– Learns during game-play– Uses information readily available (i.e. current observable

game-state)• Evolution / Co-evolution (vanilla form)– Information from game result(s)– Easier to apply– But wasteful

• Both can learn game strategy from scratch

In Pictures…

Simple Example: Mountain Car

• Often used to test TD learning methods• Accelerate a car to reach goal at top of incline• Engine force weaker than gravity (DEMO)

State Value Function

• Actions are applied to current state to generate set of future states

• State value function is used to rate these

• Choose action that leads to highest state value

• Discrete set of actions

Action Selector

• A decision function selects an output directly based on current state of system

• Action may be a discrete choice, or continuous outputs

TDL – State Value Learned

Evolution : Learns Policy, not Value

Example Network Found by NEAT+Q(Whiteson and Stone, JMLR 2006)

• EvoTDL Hybrid• They used a different input coding• So results not directly comparable

~Optimal State Value Policy Functionf = abs(v)

Action Controller

• Directly connect velocity to output

• Simple network!• One neuron!• One connection!• Easy to

interpret!vs

OthelloWith Thomas Runarsson,

University of Iceland

Volatile Piece Difference

moveMove

Setup• Use weighted piece counter– Fast to compute (can play billions of games)– Easy to visualise– See if we can beat the ‘standard’ weights

• Limit search depth to 1-ply– Enables billions of games to be played– For a thorough comparison

• Focus on machine learning rather than game-tree search

• Force random moves (with prob. 0.1)– Get a more robust evaluation of playing ability

Standard “Heuristic” Weights(lighter = more advantageous)

CEL Algorithm

• Evolution Strategy (ES)– (1, 10) (non-elitist worked best)

• Gaussian mutation– Fixed sigma (not adaptive)– Fixed works just as well here

• Fitness defined by full round-robin league performance (e.g. 1, 0, -1 for w/d/l)

• Parent child averaging– Defeats noise inherent in fitness evaluation

TDL Algorithm

• Nearly as simple to apply as CELpublic interface TDLPlayer extends Player { void inGameUpdate(double[] prev, double[] next);

void terminalUpdate(double[] prev, double tg);

• Reward signal only given at game end• Initial alpha and alpha cooling rate tuned

empirically

TDL in Java

CEL (1,10) v. Heuristic

TDL v. Random and Heuristic

TDL + CEL v. Heuristic (1 run)

Can we do better?

• Enforce symmetry– This speeds up learning

• Use trusty old friend: N-Tuple System

NTuple Systems• W. Bledsoe and I. Browning. Pattern recognition and reading by

machine. In Proceedings of the EJCC, pages 225 232, December 1959.

• Sample n-tuples of input space• Map sampled values to memory indexes

– Training: adjust values there– Recognition / play: sum over the values

• Superfast• Related to:

– Kernel trick of SVM (non-linear map to high dimensional space; then linear model)

– Kanerva’s sparse memory model– Also similar to Buro’s look-up table

Symmetric N-Tuple Sampling

3-tuple Example

N-Tuple System

• Results used 30 random n-tuples• Snakes created by a random 6-step walk– Duplicates squares deleted

• System typically has around 15000 weights• Simple training rule:

NTuple System (TDL)total games = 1250

Learned strategy…

Web-based League(snapshot before CEC 2006 Competition)

Results versus CEC 2006 Champion(a manual EVO / TDL hybrid)

N-Tuple Summary

• Stunning results compared to other game-learning architectures such as MLP

• How might this hold for other problems?• How easy are N-Tuples to apply to other

domains?

Screen Capture Mode:Ms Pac-Man Challenge

Robotic Car Racing

Conclusions

• Games are great for CI research– Intellectually challenging– Fun to work with

• Agent learning for games is still a black art• Small details can make big differences!– Which inputs to use

• Big details also! (NTuple versus MLP)• Grand challenge: how can we design more efficient

game learners?• EvoTDL hybrids are the way forward.

CIG 2008: Perth, WA; http://cigames.org

Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex,...

Documents

The Conditional Lucas & Kanade Algorithmci2cv.net/media/papers/ChenHsuan-ECCV_2016_poster.pdf · The Conditional Lucas & Kanade Algorithm Chen-Hsuan Lin, Rui Zhu, and Simon Lucey

XSLT Extensible Stylesheet Language Transformations CC432 / Short Course 507 Lecturer: Simon Lucas University of Essex Spring 2002

Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept

Distributed Object Programming with XML and Java CC432 / Short Course 509 Applied XML Lecturer: Simon Lucas University of Essex Spring 2002

LUCAS 2K15 LUCAS 2K18 - data2.manualslib.com

Lucas-Kanade 20 Years On: A Unifying Framework: Part 4€¦ · Lucas-Kanade 20 Years On: A Unifying Framework: Part 4 Simon Baker, Ralph Gross, and Iain Matthews CMU-RI-TR-04-14 Abstract

Cycle network study Sidmouth and East Devonottertrail.org/wp-content/uploads/2018/11/Sidmouth...Authors Rupert Crosbee David Lucas Checked Simon Murray Authorised Simon Pratt. Page

Lucas-Kanade 20 Years On: A Unifying Framework: Part 4 · Lucas-Kanade 20 Years On: A Unifying Framework: Part 4 Simon Baker, Ralph Gross, and Iain Matthews ... and face coding [10,17]

Lucas-Kanade 20 Years On: A Unifying Framework: Part 1rtc12/CSE598G/papers/lk20part1.pdf · Lucas-Kanade 20 Years On: A Unifying Framework: Part 1 Simon Baker and Iain Matthews CMU-RI-TR-02-16

CE881: Mobile and Social Application Programming Networking Simon M. Lucas

Simon Musgrave, University of Essex RSS/ASC January 2004

Moderator Simon Tripp, Battelle Comments Chairman Frank Lucas, House Committee on Agriculture Presentation Deborah Cummings, Battelle Impact and Innovation:

Essex County Directory.pdf2011 Essex County Directory 2 Essex County Emergency Numbers 911 Essex County Dispatch is used for all towns and villages in Essex County including - Chilson,

Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex

Lucas v. Lucas

lfd. Nr Neptun Name VB 1 1310 Essex Essex 29,00files.homepagemodules.de/b628943/f23t388p1036n2_kYsxRTfi.pdf · lfd. Nr Neptun Name VB 1 1310 Essex "Essex" 29,00 € 2 1310 Essex "Franklin"

CE881: Mobile and Social Application Programming Introduction Simon M. Lucas

CE881: Mobile and Social Application Programming Flexible Data Handling with Integrity: SQLite Simon M. Lucas

Product Geometric Crossover for the Sudoku Puzzle Alberto Moraglio, Julian Togelius & Simon Lucas IEEE CEC 2006

2017 Northern Regional Forum - Engineers Australia · Northern Regional Forum Thursday, ... Rohan Lucas, Alluvium Consulting ... Simon Kidston has an investment banking background