40
Arthur Juliani Machine Learning Engineer Unity Technologies @awjuliani

Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Arthur JulianiMachine Learning Engineer

Unity Technologies@awjuliani

Page 2: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Unity ML-Agents: A flexible platform for Deep RL research

Page 3: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

About Unity

“Creation Engine”

• Games

• AR/VR

• Cinematics

• Simulations

• 40+ Platforms

• Free for personal use

Page 4: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Research Environments

Visual Complexity Cognitive ComplexityPhysical Complexity

Page 5: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

The Unity

Ecosystem

Page 6: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the
Page 7: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Unity ML Agents Workflow

Set Up

Environment

Train

Agents

Embed

Agents

Page 8: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Unity ML Agents Workflow

Train

Agents

Embed

Agents

Set Up

Environment

Page 9: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Create Environment (Unity)

Observation & Act

Decide

Coordinate

Page 10: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Unity ML Agents Workflow

Train

Agents

Embed

Agents

Set Up Game

for Training

Page 11: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Agent Training Process

Page 12: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Training Methods

Reinforcement Learning Imitation Learning

1 2

Page 13: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Reinforcement Learning Process

Observe

Act

Reward

ExploitExplore

1

Page 14: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Example: Chicken Crossing the Road

• Observe: Pixels in frame

• Actions:

• Reward Signal• Negative for being hit

• Positive for gift pickup

1

Page 15: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the
Page 16: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Imitation Learning Process

Demonstrate to the

machine how it’s done

Policy is created by

imitating the human

2

Page 17: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the
Page 18: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Unity ML Agents Workflow

Train

Agents

Embed

Agents

Set Up Game

for Training

Page 19: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Embedding agents into a game (experimental)

● Import .bytes file into the Unity

project (this the model file that is

created from training agents)

● Set corresponding brain

component to “Internal”

● The agent will run in the game or

scene based on model created

● Inferencing is a very challenging,

industry wide problem

Page 20: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

We are collaborating with different industry leaders

Microsoft WinML – Windows devices

Google Tensforflow Lite – Android devices

Apple CoreML – Apple iOS and OSX devices

Other platforms planned for the future – Nintendo, Sony

Page 21: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Task Possibilities

Page 22: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Goal Balance ball as long as possible

Observation

s

Platform rotation, ball position and

rotation

Actions Platform rotation (in x and z)

Rewards Bonus for keeping ball up

3D Balance Ball

Our first environment to use ML-Agents

Page 23: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the
Page 24: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Easy

Difficult

Curriculum Learning

● Agents learn from simpler exercises and

combines the learning to tackle much more

difficult tasks

Page 25: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Easy

Difficult

Final Outcome

Page 26: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Enabling Long Short-Term Memory

Page 27: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the
Page 28: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Multi-Agent Soccer Training

Page 29: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Multi-Agent Banana CollectorsT

rain

ed

TestedS

ca

rce

Ple

ntifu

l

ScarcePlentiful

Page 30: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Mujoco Continuous Control Tasks in Unity

Page 31: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Unity Continuous Control Tasks in Unity

Page 32: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Curiosity-Driven Exploration

• In some environments the rewards are

sparsely distributed

• “+1 for accomplishing goal”

• Intrinsic reward can encourage

exploration

• Reward agent based on experienced

surprise in outcome of actions

Implementation of: “Curiosity-driven Exploration by Self-supervised Prediction“

Pathak et al., 2017

Page 33: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Pyramids Environment

• Nine rooms

• One switch

• Six stone pyramids

• Once switch pressed, brick pyramid

spawned

• Gold brick on top of brick pyramid

provides +2 reward.

Page 34: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Intrinsic Curiosity Module

Page 35: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Extrinsic Reward Only

Page 36: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Intrinsic Reward Only

Page 37: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Both Rewards

Page 38: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Results

Page 39: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

In Progress - Hierarchical Control

Cognitive Agent• Observes “Visual”

information

• Acts on target direction

• Reward: goal contact

Motor Agent• Observes proprioceptive

information; target

direction

• Acts on Joint torques

• Reward: target direction

alignment

Page 40: Unity Technologies @awjuliani€¦ · 3D Balance Ball Our first environment to use ML-Agents. Easy Difficult Curriculum Learning Agents learn from simpler exercises and combines the

Get ML-Agents at GitHub Now

github.com/Unity-Technologies/ml-agents

Contact Us

[email protected]

Please share your feedback!

Arthur JulianiMachine Learning Engineer

[email protected]

@awjuliani