57
1

Real-time neuroevolution in the NERO video game

  • Upload
    creda

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Real-time neuroevolution in the NERO video game. overview. Background on video games Neural networks NE NEAT rtNEAT NERO. background. - PowerPoint PPT Presentation

Citation preview

Page 1: Real-time neuroevolution in the NERO video game

1

Page 2: Real-time neuroevolution in the NERO video game

2

overviewoverview

Background on video gamesBackground on video games Neural networksNeural networks NENE NEATNEAT rtNEATrtNEAT NERONERO

Page 3: Real-time neuroevolution in the NERO video game

3

backgroundbackground

The global video game market, according to analysts, will increase from $25.4 billion in revenue in 2004 to almost $55 billion in 2009 , larger than even that of Hollywood.

Techniques in artificial intelligence can potentially both increase the longevity of video games and decrease their production costs.

Page 4: Real-time neuroevolution in the NERO video game

4

backgroundbackground

In most games character behavior is In most games character behavior is scripted. scripted. no matter how many times the player exploits a weakness, that weakness is never repaired.

So what to do?

Page 5: Real-time neuroevolution in the NERO video game

5

The solutionThe solution

Machine learning can keep video games interesting by allowing agents to change and adapt.

The agents can be trained to perform complex behaviors.

if they would be trained offline (out-game learning) and then “freeze” and put in the game, they will not adapt and change in response to players in the game.

Page 6: Real-time neuroevolution in the NERO video game

6

The solutionThe solution

That’s why agents should adapt and change in real-time, and a powerful and reliable machine learning method is needed.

Prior examples in the MLG genre include the Tamagotchi virtual pet

and the video “God game” Black & White.

Page 7: Real-time neuroevolution in the NERO video game

7

The solutionThe solution

Here I introduce the machine learning : rtNEAT – real time Neuroevolution Of Augmenting Topologies.

And the game that implement the rtNEAT: NERO – neuroEvolving robotic operatives.

Page 8: Real-time neuroevolution in the NERO video game

8

Neural NetworksNeural Networks A try to somehow mimic the neural network of a

human being in a simplistic way. way. The network is a collection of simple units that The network is a collection of simple units that

mimic a neuron:mimic a neuron:

2

0.2

1.5

weightsinput Weighted weights Computin

g unit

Weighted weights

threshold

output

Σ

Page 9: Real-time neuroevolution in the NERO video game

9

Neural NetworksNeural Networks

The network adapts as follows: change the weight by an amount proportional The network adapts as follows: change the weight by an amount proportional to the difference between the desired output and the actual output:to the difference between the desired output and the actual output:

where η is the learning rate, D is the desired output, and Y is the actual output where η is the learning rate, D is the desired output, and Y is the actual output

ii IYDW *)(*

Page 10: Real-time neuroevolution in the NERO video game

10

NE – NeuroevolutionNE – Neuroevolution

The artificial evolution of neural networks using an evolutionary algorithm.

The network The network can compute arbitrarily complex functions, can both learn and perform in the presence of noisy inputs, and generalize its behavior to previously unseen inputs.

Page 11: Real-time neuroevolution in the NERO video game

11

Demands of video games Demands of video games AIAI

1) Large state/action space: high dimensional space and having to check the value of every possible action on every game tick for every agent in the game.

- In NE agents are evolved to output only a single requested action per game tick

Page 12: Real-time neuroevolution in the NERO video game

12

Demands of video games Demands of video games AIAI

2) Diverse behaviors: Agents should not all converge to the same Behavior - A homogeneous population would make the game boring.

- In NE Diverse populations can be explicitly maintained through speciation

Page 13: Real-time neuroevolution in the NERO video game

13

Demands of video games Demands of video games AIAI

3) Consistent individual behaviors: Characters should take a random action in order to explore new behaviors, but not periodically making odd moves.

- In NE The behavior of an agent does not change because it always chooses actions from the same network.

Page 14: Real-time neuroevolution in the NERO video game

14

Demands of video games Demands of video games AIAI

4) Fast adaptation and sophisticated behaviors: not waiting hours for agents to adapt require simple presentation but that results in simple behaviors.

- In NE A representation of the solution can be evolved, allowing simple behaviors in the beginning and complexified later.

Page 15: Real-time neuroevolution in the NERO video game

15

Demands of video games Demands of video games AIAI

5) Memory of past states: in order to react more convincingly to the present situation requires keeping track of more than the current state.

- In NE Recurrent neural networks can be evolved, what gives us a memory of past situations.

Page 16: Real-time neuroevolution in the NERO video game

16

NEATNEAT

A technique for evolving neural networks using an evolutionary algorithm.

The correct topology not needed to be known prior to evolution.

Unique in that it begins with minimal networks and adds connections and nodes to them over generations, allowing complex problems to be solved based upon simple ones.

Page 17: Real-time neuroevolution in the NERO video game

17

Genetic encoding - Genetic encoding - GenomeGenome

Page 18: Real-time neuroevolution in the NERO video game

18

Genetic encoding - Genetic encoding - MutationMutation

Page 19: Real-time neuroevolution in the NERO video game

19

Genetic encoding - Genetic encoding - MutationMutation

Page 20: Real-time neuroevolution in the NERO video game

20

Genetic encoding - Genetic encoding - resultsresults

The splitting inserting into the system a nonlinearity.

Old behaviors that were in the pre existing networks are not vanished and their quality stay almost the same, with the opportunity to refine those behaviors.

Page 21: Real-time neuroevolution in the NERO video game

21

Cross over – historical Cross over – historical markingsmarkings

When a mutation occurs a global innovation number is incremented and assigned.

When cross over happens the offspring will inherit the innovation numbers.

Thus the historical origin of every gene is known throughout the evolution.

Page 22: Real-time neuroevolution in the NERO video game

22

Crossover – historical Crossover – historical markingsmarkings

With innovation numbers we don’t need to do a topological analysis.

Two genes with the same historical origin represent the same structure since they were both derived from the same ancestral gene at some point in the past

Page 23: Real-time neuroevolution in the NERO video game

23

Crossover – historical Crossover – historical markingsmarkings

Page 24: Real-time neuroevolution in the NERO video game

24

SpeciationSpeciation

NEAT divides the population into species.

The individuals compete primarily with their own niches.

this way topological innovations are protected and have time to optimize their structure before competing with other niches, like new ideas that reach a potential before elimination.

Page 25: Real-time neuroevolution in the NERO video game

25

SpeciationSpeciation

Another advantage that Species with smaller genomes survive as long as their fitness is competitive, ensuring that small networks are not replaced by larger ones unnecessarily.

Page 26: Real-time neuroevolution in the NERO video game

26

SpeciationSpeciation

Page 27: Real-time neuroevolution in the NERO video game

27

SpeciationSpeciation To examine the similarity of networks a

distance is formulated:

E = number of excess, D = number of disjoint, E = number of excess, D = number of disjoint, W = average weight differences of matching W = average weight differences of matching genes, N = num of genes in the larger Genome genes, N = num of genes in the larger Genome (a normalizing factor)(a normalizing factor)

c1,c2 and c3 defines the importance of the c1,c2 and c3 defines the importance of the three factors.three factors.

WcN

Dc

N

Ec3

21

Page 28: Real-time neuroevolution in the NERO video game

28

SpeciationSpeciation

if a genome’s distance to a randomly chosen member of the species is less than δt, a compatibility threshold, the genome is placed into this species.

δt can be dynamic, be raised if there are too many species and lowered if too few.

If a genome is not compatible with any existing species, a new species is created.

Page 29: Real-time neuroevolution in the NERO video game

29

Speciation – adjusted Speciation – adjusted fitnessfitness

The fitness of an organism is also defined by its niche:

WhereWhere

And you sum over all the distances from the And you sum over all the distances from the organisms in the population resulting in the organisms in the population resulting in the number of organisms in the same species as number of organisms in the same species as organism i.organism i.

n

jjish

ifif

1)),((

'

else 1

if 0 sh

t

Page 30: Real-time neuroevolution in the NERO video game

30

Speciation - offspringSpeciation - offspring

The number of offspring distributed to to each species k is:each species k is:

Where FWhere Fk k is the average fitness of species is the average fitness of species k,k,FFtot tot = = ΣΣk k FFkk is the total of all species is the total of all species fitness average, and |P| is the size of the fitness average, and |P| is the size of the populationpopulation

PF

Fn

tot

kk

Page 31: Real-time neuroevolution in the NERO video game

31

Speciation - offspringSpeciation - offspring

First we eliminate the lowest First we eliminate the lowest performing members of the performing members of the population. population.

The whole generation is then The whole generation is then replaced by the new offspring.replaced by the new offspring.

Page 32: Real-time neuroevolution in the NERO video game

32

NEAT – minimization and NEAT – minimization and complexification complexification

NEAT begins with uniform simple NEAT begins with uniform simple networks with no hidden nodes, differing networks with no hidden nodes, differing only by their initial random weights.only by their initial random weights.

It adds connections and nodes It adds connections and nodes incrementally.incrementally.

Only the most fitted structures remains.Only the most fitted structures remains. NEAT searches for the optimal topology NEAT searches for the optimal topology

by complexifying existing network. by complexifying existing network.

Page 33: Real-time neuroevolution in the NERO video game

33

NEAT - performanceNEAT - performance In experiments it has been shown that all three In experiments it has been shown that all three

main components of NEAT are essential for it main components of NEAT are essential for it to work – the historical markings, speciation, to work – the historical markings, speciation, and starting from minimal structure.and starting from minimal structure.

It was also shown that NEAT outperform other It was also shown that NEAT outperform other NE methods, especially those that are fixed NE methods, especially those that are fixed topology evolution because it find faster topology evolution because it find faster complex solutions by starting with simple complex solutions by starting with simple networks and expanding only when beneficial.networks and expanding only when beneficial.

Page 34: Real-time neuroevolution in the NERO video game

34

rtNEATrtNEAT

In order for players to interact with In order for players to interact with evolving agents in evolving agents in real timereal time NEAT NEAT was improved to rtNEAT.was improved to rtNEAT.

Now fitness statistics are collected Now fitness statistics are collected constantly as the game is played, constantly as the game is played, and the agents evolved continuously and the agents evolved continuously as well.as well.

Page 35: Real-time neuroevolution in the NERO video game

35

rtNEAT – replacement rtNEAT – replacement cyclecycle

Every few game ticks the worst Every few game ticks the worst individual is replaced by an offspring of individual is replaced by an offspring of parents chosen from among the best.parents chosen from among the best.

Page 36: Real-time neuroevolution in the NERO video game

36

rtNEAT - algorithmrtNEAT - algorithm

Because rtNEAT performs in real Because rtNEAT performs in real time it cannot produce the whole time it cannot produce the whole generation at once like NEAT, and generation at once like NEAT, and so the algorithm loop must change. so the algorithm loop must change.

Page 37: Real-time neuroevolution in the NERO video game

37

rtNEAT - algorithmrtNEAT - algorithm

Page 38: Real-time neuroevolution in the NERO video game

38

rtNEAT - algorithmrtNEAT - algorithm

1) Calculating adjusted fitness:

Where |S| is the number of Where |S| is the number of individuals in the species.individuals in the species.

S

ff ii '

Page 39: Real-time neuroevolution in the NERO video game

39

rtNEAT - algorithmrtNEAT - algorithm

2) Removing the worst Agent:a) if we remove the worst unadjusted fitness agent, innovation preservation will be damaged, because new small species will be eliminated as soon as they appear.That’s why the worst adjusted fitness agent will be removed.

Page 40: Real-time neuroevolution in the NERO video game

40

rtNEAT - algorithmrtNEAT - algorithm

2) Removing the worst Agent:a) if we remove the worst unadjusted fitness agent, innovation preservation will be damaged, because new small species will be eliminated as soon as they appear.That’s why the worst adjusted fitness agent will be removed.

Page 41: Real-time neuroevolution in the NERO video game

41

rtNEAT - algorithmrtNEAT - algorithm

2) Removing the worst Agent:b) agents must also have time to evaluated sufficiently before removed, because unlike NEAT that each individual live the same amount of time, in rtNEAT different agents have been around for different amount of time.That’s why rtNEAT only removes agents that have played for more than the minimum amount of time m.

Page 42: Real-time neuroevolution in the NERO video game

42

rtNEAT - algorithmrtNEAT - algorithm

2) Removing the worst Agent:c) we must re-estimate the average fitness F, because a specie now have one less member, and it’s average most likely been changed.

Page 43: Real-time neuroevolution in the NERO video game

43

rtNEAT - algorithmrtNEAT - algorithm

3) Creating offspring:the probability to choose a parent specie is proportional to it’s average fitness compared with the total of all species’ average fitnesses:

And a single new offspring is created by recombining two individuals from the parent species

tot

k

krF

FSP )(

Page 44: Real-time neuroevolution in the NERO video game

44

rtNEAT - algorithmrtNEAT - algorithm

4) Reassigning agents to species:because minimizing the number of species is important in real time environment, when CPU time is not allocated all to evolution, the threshold δt must be dynamically adjusted.But also the entire population must be assigned again to species.

Page 45: Real-time neuroevolution in the NERO video game

45

rtNEAT - algorithmrtNEAT - algorithm

4) Replacing the old agent with the new one:replacing depends on the game.You may just replace the neural network in the body of the removed agent. Or you may kill it and replace a new agent instead.

Page 46: Real-time neuroevolution in the NERO video game

46

The loop interval should be every n ticks. For n to be chosen a “law of eligibility” is formulated:

Where m is the minimum time alive, n is the number of ticks between replacement (the loop interval), |p| is the population size, and I is the fraction of the population that is too young to be evaluated.

rtNEAT – loop intervalrtNEAT – loop interval

nP

mI

Page 47: Real-time neuroevolution in the NERO video game

47

rtNEAT – loop intervalrtNEAT – loop interval

nP

mI

IP

mn

And I left for the user to determine, because its most critical for performance.

Page 48: Real-time neuroevolution in the NERO video game

48

the idea of the game is to train a team of agents by designing a training program.

In order for the agents to learn, the learning algorithm is of course rtNEAT.

Page 49: Real-time neuroevolution in the NERO video game

49

NERO - fitnessNERO - fitness

In the training there are sliders that In the training there are sliders that lets you configure what attributes lets you configure what attributes you want your agents to have.you want your agents to have.

Page 50: Real-time neuroevolution in the NERO video game

50

NERO - fitnessNERO - fitness

fitness is the sum of all the components fitness is the sum of all the components ccii multiply by the sliders value multiply by the sliders value vvii::

To put in also the rate of forgetting To put in also the rate of forgetting rr into into the fitness this formula is given: the fitness this formula is given:

Where Where fftt is the current fitness.is the current fitness.

components

iit vcs *

r

fsff tttt

1

Page 51: Real-time neuroevolution in the NERO video game

51

NERO - sensorsNERO - sensors Agents have several types of Agents have several types of

sensors.sensors. Their output is: the direction of Their output is: the direction of

movement, and whether or not to movement, and whether or not to fire.fire.

Page 52: Real-time neuroevolution in the NERO video game

52

NERO - sensorsNERO - sensors Enemy radar:Enemy radar:

divides the 360divides the 360º º area around the area around the agent into slices. Each slice agent into slices. Each slice activates a sensor in proportion to activates a sensor in proportion to how close the enemy in that slice. how close the enemy in that slice.

Page 53: Real-time neuroevolution in the NERO video game

53

NERO - sensorsNERO - sensors Range finder:Range finder:

projects ray at several angles, and projects ray at several angles, and the distance the ray travels before it the distance the ray travels before it hits something is returned as the hits something is returned as the sensor value. sensor value.

Page 54: Real-time neuroevolution in the NERO video game

54

NERO - sensorsNERO - sensors On-target sensor:On-target sensor:

returns full activation only if a ray returns full activation only if a ray projected along the front heading of projected along the front heading of the agents hits an enemy. This the agents hits an enemy. This sensor tells the agent whether it sensor tells the agent whether it should attempt to shootshould attempt to shoot

Page 55: Real-time neuroevolution in the NERO video game

55

NERO - sensorsNERO - sensors Line-of-fire sensor:Line-of-fire sensor:

detect where a bullet from the detect where a bullet from the closest enemy is heading. Thus, can closest enemy is heading. Thus, can be used to avoid fire. They work by be used to avoid fire. They work by computing where the line of fire computing where the line of fire intersects rays projecting from the intersects rays projecting from the agent.agent.

Page 56: Real-time neuroevolution in the NERO video game

56

NERO – evolved networksNERO – evolved networks Seekers VS Wall-fightersSeekers VS Wall-fighters

Page 57: Real-time neuroevolution in the NERO video game

57