65
Preparing AI for Parallelism Lessons from NASCAR The Game 2011 Neil Henning – Technology Lead Paris Game AI Conference 2011 Neil Henning [email protected] m

Paris Game/AI Conference 2011

Embed Size (px)

DESCRIPTION

Slides from the Paris Game/AI Conference 2011 talk by Neil Henning - covering the

Citation preview

Page 1: Paris Game/AI Conference 2011

Preparing AI for Parallelism

Lessons from NASCAR The Game 2011Neil Henning – Technology Lead

Paris Game AI Conference 2011

Neil [email protected]

m

Page 2: Paris Game/AI Conference 2011

Introduction

Paris Game AI Conference 2011

Neil [email protected]

m

Page 3: Paris Game/AI Conference 2011

I am sure some of you are wondering...

Introduction

Paris Game AI Conference 2011

Why a guy from

is doing a talk about

which was developed by

Neil [email protected]

m

Page 4: Paris Game/AI Conference 2011

● Team from Codeplay worked for 15 months on game

Introduction

Paris Game AI Conference 2011

Neil [email protected]

m

Page 5: Paris Game/AI Conference 2011

Introduction

● NASCAR isn’t just about driving straight, then turning left

● 43 cars on screen at the same time

● Overtaking is all about navigating through these packs● Cannot simply make the AI use LODs, nearly always in

view

● Cars race in tight packs on the circuit

Paris Game AI Conference 2011

Neil [email protected]

m

Page 6: Paris Game/AI Conference 2011

Agenda

Paris Game AI Conference 2011

Neil [email protected]

m

Page 7: Paris Game/AI Conference 2011

● How to prepare AI for parallelism

Agenda

● …by investigating NASCAR the Game 2011's AI

Paris Game AI Conference 2011

Neil [email protected]

m

Page 8: Paris Game/AI Conference 2011

Agenda

● During the investigation I will answer the questions:

● Why prepare your AI for parallelism?

● What changes should be made?

● How did these changes help when optimizing NASCAR?

● How did we make use of the PS3's unique hardware?

● What common issues are there?

Paris Game AI Conference 2011

Neil [email protected]

m

● What performance improvement was achieved?

Page 9: Paris Game/AI Conference 2011

Why prepare your AI for parallelism?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 10: Paris Game/AI Conference 2011

Why prepare your AI for parallelism?

● Without parallelism, tighter limits on number of bots

Paris Game AI Conference 2011

frame length

● Say we have four bots

● In serial – can easily fit in a frame

Neil [email protected]

m

Page 11: Paris Game/AI Conference 2011

Why prepare your AI for parallelism?

● Without parallelism, tighter limits on number of bots

Paris Game AI Conference 2011

frame length

● Want to increase bots by 3x?

● Have to either optimize or parallelize (or both)

Neil [email protected]

m

Page 12: Paris Game/AI Conference 2011

Why prepare your AI for parallelism?

● Without parallelism, tighter limits on number of bots

Paris Game AI Conference 2011

frame length

● Split work between threads

● Only possible with parallelism

Neil [email protected]

m

Page 13: Paris Game/AI Conference 2011

Why prepare your AI for parallelism?

● Multicore is the future (has been for some time)

Paris Game AI Conference 2011

● Even iPad uses dual core processors now!

● Sony's new PS Vita is quad core

● This generation of consoles are multicore

● Being able to split work amongst cores is key

● Might not be required yet, but could be essential later

Neil [email protected]

m

Page 14: Paris Game/AI Conference 2011

Why prepare your AI for parallelism?

● Helps during crunch time

Paris Game AI Conference 2011

● Have AI prepared to become parallel

● Either optimize engine or cut features

● Optimization being sought throughout engine

● Optimization folks will love you!

Neil [email protected]

m

Page 15: Paris Game/AI Conference 2011

What changes should be made?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 16: Paris Game/AI Conference 2011

● Split work into manageable chunks

Paris Game AI Conference 2011

Neil [email protected]

m

What changes should be made?

● In NASCAR, had 18 components for each car

Stay

Behind

Stay

Beside

Obstacle

Detectio

n

Driving

Controllers

Page 17: Paris Game/AI Conference 2011

● Components are in groups

Paris Game AI Conference 2011

Neil [email protected]

m

What changes should be made?

● All components in a group can be run in parallel

Page 18: Paris Game/AI Conference 2011

● 43 cars = 43 AIs

Paris Game AI Conference 2011

Neil [email protected]

m

What changes should be made?

● Each car’s groups can be run in parallel too

0

1

2

42

Page 19: Paris Game/AI Conference 2011

What changes should be made?

● Read/Write phases

Paris Game AI Conference 2011

● Two phases for your AI

● Read phase can read world/other car state

● Write phase can modify own car state

Neil [email protected]

m

Page 20: Paris Game/AI Conference 2011

What changes should be made?

● Use temporary data to store read values from

environment

Paris Game AI Conference 2011

● In read phase, store needed reads into temporary data

● In write phase, read from the temporary data

● AI is one frame behind world events

Neil [email protected]

m

● Effect on AI is minimal

Page 21: Paris Game/AI Conference 2011

What changes should be made?

● In NASCAR a read/write phase was used

Paris Game AI Conference 2011

Neil [email protected]

m

Write

Phase

Read

Phase● Write phase uses data from previous frames read phase

● Minimal set of components in read/write phase group

● Only components that required world/other car state

Page 22: Paris Game/AI Conference 2011

What changes should be made?

● Remove large stack locals

Paris Game AI Conference 2011

● Having two or more threads means lots of duplicate

locals

Neil [email protected]

m

void func(){

char localBuffer[1024];// … do something with localBuffer

}

● If func is called from many threads, many times data

use!

Page 23: Paris Game/AI Conference 2011

What changes should be made?

● Document code – describe relationship between data

Paris Game AI Conference 2011

Neil [email protected]

m

struct Foo{

Bar * bar;};

one :

one?

one :

many?

many :

one?

Page 24: Paris Game/AI Conference 2011

What changes should be made?

● Document code – describe relationship between data

Paris Game AI Conference 2011

Neil [email protected]

m

struct Foo{

Bar * bar;};

● Knowing how data is shared critical for threading

● Documenting the relationship saves time and effort later

Page 25: Paris Game/AI Conference 2011

What common issues are there?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 26: Paris Game/AI Conference 2011

What common issues are there?

● Virtual functions – can have a high runtime cost

Paris Game AI Conference 2011

● ~500-1200 cycles on PowerPC if virtual lookup misses

cache

Neil [email protected]

m

● Can equate to a large amount of time doing no work

Page 27: Paris Game/AI Conference 2011

What common issues are there?

● In NASCAR, components had virtual update method

Paris Game AI Conference 2011

● Based on previous game (Supercar Challenge)

Neil [email protected]

m

● 16 cars in previous, now 43 cars

● 5 component types in previous, now 18 component

types

● Now read/write phase too

● 80 virtual calls to update became 1333 virtual calls!

Page 28: Paris Game/AI Conference 2011

What common issues are there?

● In NASCAR, components had virtual update method

Paris Game AI Conference 2011

● In real terms, 3ms of virtual function lookup per frame

Neil [email protected]

m

● First optimization was to have typed buckets of

components

● 1333 virtual calls went to 31 virtual calls

● Platform agnostic (PS3, 360 and Wii all sped up)

Page 29: Paris Game/AI Conference 2011

What common issues are there?

● Virtual functions not just a code abstraction

Paris Game AI Conference 2011

● Virtual functions hide data too

Neil [email protected]

m

● Not knowing the size of data kills SPU/Compute

development

struct Foo { virtual void func(); };struct Bar : public Foo { virtual void func(); };

Foo * foo;foo->func();

// don’t know size of foo! Could be sizeof(Foo) || sizeof(Bar)

Page 30: Paris Game/AI Conference 2011

What common issues are there?

● Naïve multithreading – locks galore

Paris Game AI Conference 2011

● Locks can be a solution, be very careful of use though

Neil [email protected]

m

void func(){

lock->lock();// … do somethinglock->unlock();

}

● Read/write phases allow removal of most (if not all) locks

● Avoid/reduce/remove locks if possible

Page 31: Paris Game/AI Conference 2011

What common issues are there?

● Physics subsystem caused issues with NASCAR

Paris Game AI Conference 2011

● Physics system used, raycast to find problematic

obstacles

Neil [email protected]

m

● Each call to raycast used a mutex, every thread would

halt!

● AI required knowledge of obstacles

● Had to refactor code to remove need for locking

Page 32: Paris Game/AI Conference 2011

What common issues are there?

● Know your data – how is it accessed? Where is it shared?

Paris Game AI Conference 2011

Neil [email protected]

m

struct RaceCar { Brain * brain; };

struct Brain { RaceCar * raceCar; Obstacle ** obstacles; };

struct Obstacle { BrainInterface * interface; };

struct BrainInterface { RaceCar * raceCar; Brain * brain; };

● Very easy for systems grown over time to have

convoluted struct layouts

Page 33: Paris Game/AI Conference 2011

How did these changes help when optimizing

NASCAR?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 34: Paris Game/AI Conference 2011

How did these changes help when optimizing NASCAR?

● Read/Write phase was key to performance on Xbox 360

Paris Game AI Conference 2011

● Allowed work to be split across all 6 threads

Neil [email protected]

m

● Each thread was given 1/6th of the cars to process

● Takes 2ms of all CPU resources on 360 in a frame

...

barriers

Page 35: Paris Game/AI Conference 2011

How did these changes help when optimizing NASCAR?

● Tried the same approach on PS3

Paris Game AI Conference 2011

● Both threads on PS3 were completely full

Neil [email protected]

m

● Any multithreading speedup has to be on the SPUs

● Code was ~2Mb and data was ~8Mb – far too large!

● Each SPU has 256kb local storage (for code & data)

● Unfeasible to mimic 360 approach

● Only 2 threads on PS3, but have 6 sub processors (the

SPUs)

Page 36: Paris Game/AI Conference 2011

● On PS3 most costly components were targeted

How did these changes help when optimizing NASCAR?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 37: Paris Game/AI Conference 2011

How did these changes help when optimizing NASCAR?

● PS3 version relied on components being run in parallel

Paris Game AI Conference 2011

● And all components in a group being able to be run in

parallel

Neil [email protected]

m

● Costly groups were made to use the SPUs

● Knowing relationship between data was key

● Well documented code made life so much easier!

Page 38: Paris Game/AI Conference 2011

How did we make use of the PS3's unique

hardware?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 39: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● Codeplay was asked by Eutechnyx to optimize the AI

Paris Game AI Conference 2011

● Very tight deadlines, 1 month to reduce time taken in AI

Neil [email protected]

m

● No main thread time left – have to use the SPUs

● Our Offload compiler technology crucial

Page 40: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● For those unfamiliar with coding for the SPU…

Paris Game AI Conference 2011

● They are amazingly fast, if you code correctly for them

Neil [email protected]

m

● Normally requires total rewrite of existing codebase

● Painful to access global variables

● Virtual functions are a complete write off

Page 41: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● SPU development typically takes many months

Paris Game AI Conference 2011

● Common to have 4-5 SPU programmers for ~10 months

Neil [email protected]

m

● Not feasible for late-in-cycle development

● Offload aims to mitigate the issues with getting code

onto SPU

● Can offload code to SPU much quicker (typically a few

man days)

● Much easier to move existing code bases to SPU

Page 42: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

Paris Game AI Conference 2011

Neil [email protected]

m

● Small language extension moves work from PPU to SPU

● Any work within an offload block is performed on the SPU

__blockingoffload(){

// do some work on SPU, PPU waits for completion!};

offloadThread_t handle = __offload(){

// do some work on SPU!};

// can do some work on PPU before waiting for SPUoffloadThreadJoin(handle);

● All PPU code is duplicated for the SPU

Page 43: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

Paris Game AI Conference 2011

Neil [email protected]

m

● Offload allows access to global variables

● Just use them as normal!

int aGlobalVariable;

__blockingoffload(){

int aLocalVariable = aGlobalVariable;};

Page 44: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

Paris Game AI Conference 2011

Neil [email protected]

m

● Offload allows virtual function calls too

● Just have to specify which virtual functions may be called

struct Foo { virtual void bar() {} };

__blockingoffload[Foo::bar this](){

Foo foo;foo.bar();

};

Page 45: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● First, profiled the AI during a typical race

Paris Game AI Conference 2011

Neil [email protected]

m

Driving Controllers

Obstacle Detection

Stay Behind Other Car

Stay Beside Other Car

● Four components taking most of the frame time

Page 46: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● Used four slightly different strategies when

multithreading

Paris Game AI Conference 2011

Neil [email protected]

m

Driving Controllers

Obstacle Detection

Stay Behind Other Car

Stay Beside Other Car

Page 47: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● Obstacle Detection only component in its group

Paris Game AI Conference 2011

Neil [email protected]

m

Obstacle Detection

● Very inefficient code for the SPU, but moved 1/3 onto 4

SPUs

Page 48: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● Looked at Stay Behind/Beside Other Car together

Paris Game AI Conference 2011

Neil [email protected]

m

Stay Behind Other Car

Stay Beside Other Car

● In the same group, can be run in parallel

Page 49: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● Moved Stay Behind component to SPU

Paris Game AI Conference 2011

Neil [email protected]

m

Stay Behind Other Car

Stay Beside Other Car

● Stay Beside component would continue to be run on PPU

Page 50: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● As long as SPU work was less time than the PPU work, no

cost!

Paris Game AI Conference 2011

Neil [email protected]

m

Stay Behind Other Car

Stay Beside Other Car

● Effectively ‘hid’ the cost of calculating Stay Behind

component

Page 51: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● Lastly, driving controllers took 1/3 of AI cost alone

Paris Game AI Conference 2011

Neil [email protected]

m

Driving Controllers

● Split the cars across 4 SPUs, and ran in parallel

Page 52: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● In total ~170 source code changed

Paris Game AI Conference 2011

● Changes were purely optimization

Neil [email protected]

m

AIObstacle ** obstacles;unsigned int numObstacles;offloadThread_t handle = __offload(obstacles, numObstacles){

for(unsigned int i = 0; i < numObstacles; i++){

AIObstacle * obstacle = obstacles[i];

// use obstacle for calculations}

};

Page 53: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● In total ~170 source code changed

Paris Game AI Conference 2011

● Changes were purely optimization

Neil [email protected]

m

// array of AIObstacle * ’s on main memoryAIObstacle ** obstacles;unsigned int numObstacles;offloadThread_t handle = __offload(obstacles, numObstacles){

for(unsigned int i = 0; i < numObstacles; i++){

// AIObstacle * points to main memoryAIObstacle * obstacle = obstacles[i];

// use obstacle for calculations}

};

Page 54: Paris Game/AI Conference 2011

How did we make use of the PS3's unique hardware?

● In total ~170 source code changed

Paris Game AI Conference 2011

● Changes were purely optimization

Neil [email protected]

m

// array of AIObstacle * ’s on main memoryAIObstacle ** obstacles;unsigned int numObstacles;offloadThread_t handle = __offload(obstacles, numObstacles){

CachedPointer<AIObstacle *>innerObstacles(obstacles, numObstacles);

for(unsigned int i = 0; i < numObstacles; i++){

// AIObstacle * points to main memoryCachedPointer<AIObstacle>

obstacle(innerObstacles[i]);// use obstacle for calculations

}};

Page 55: Paris Game/AI Conference 2011

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil [email protected]

m

Page 56: Paris Game/AI Conference 2011

What performance improvement was achieved?

● Obstacle detection went from 2ms -> 1.1ms

Paris Game AI Conference 2011

Neil [email protected]

m

● ~100 lines of source code changed

Obstacle Detection

● 2½ weeks development time

Page 57: Paris Game/AI Conference 2011

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil [email protected]

m

Obstacle Detection

● Obstacle detection went from 2ms -> 1.1ms

● ~100 lines of source code changed

● 2½ weeks development time

Page 58: Paris Game/AI Conference 2011

What performance improvement was achieved?

● Stay Behind went from 1.1ms -> 0ms (hidden behind

other)

Paris Game AI Conference 2011

Neil [email protected]

m

● ~50 lines of source code changed

Stay Behind Other Car

Stay Beside Other Car

● 1 week development time

Page 59: Paris Game/AI Conference 2011

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil [email protected]

m

Stay Behind Other Car

Stay Beside Other Car

● Stay Behind went from 1.1ms -> 0ms (hidden behind

other)● ~50 lines of source code changed

● 1 week development time

Page 60: Paris Game/AI Conference 2011

What performance improvement was achieved?

● Driving Controllers went from 4ms -> 0.6ms

Paris Game AI Conference 2011

Neil [email protected]

m

● ~20 lines of source code changed

Driving Controllers

● 8 hours development time

Page 61: Paris Game/AI Conference 2011

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil [email protected]

m

Driving Controllers

● Driving Controllers went from 4ms -> 0.6ms

● ~20 lines of source code changed

● 8 hours development time

Page 62: Paris Game/AI Conference 2011

What performance improvement was achieved?

● Performance speaks for itself!

Paris Game AI Conference 2011

Neil [email protected]

m

● 50% speed improvement on PS3

Page 63: Paris Game/AI Conference 2011

Takeaway

Paris Game AI Conference 2011

Neil [email protected]

m

Page 64: Paris Game/AI Conference 2011

Takeaway

● It is possible to parallelise late in development

Paris Game AI Conference 2011

● But need code ready to be parallelised

Neil [email protected]

m

● Small changes in coding style lead to hugely better

results● Better to plan systems from beginning with multicore in

mind

Page 65: Paris Game/AI Conference 2011

Questions?

Paris Game AI Conference 2011

Neil [email protected]

m

Can also catch me on twitter @sheredom