31
© The AnyLogic Company | www.anylogic.com Practical Applications of Deep Reinforcement Learning Using AnyLogic The AnyLogic Conference 2019, Austin, TX Arash Mahdavi , Program Lead, The AnyLogic Company Ty Wang , Vice President of Business Development, Skymind

Practical Applications of Deep Reinforcement Learning ... · DL4J and RL4J libraries. SKIL & AnyLogic. Summer 2019. AnyLogic Cloud Python API (RL ready) June 2019. RL capabilities

  • Upload
    others

  • View
    34

  • Download
    0

Embed Size (px)

Citation preview

© The AnyLogic Company | www.anylogic.com

Practical Applications of Deep Reinforcement Learning Using AnyLogic

The AnyLogic Conference 2019, Austin, TX

Arash Mahdavi, Program Lead, The AnyLogic CompanyTy Wang, Vice President of Business Development, Skymind

© The AnyLogic Company | www.anylogic.com 2

Learning and decision making from a simulation model

FINAL MODEL

LEARN

Simulation model is an extension of someone’s mental model

© The AnyLogic Company | www.anylogic.com 3

Learning and decision making from a simulation model

FINAL MODEL

LEARN

© The AnyLogic Company | www.anylogic.com 4

Simulation as the reinforcement learning environment

SIMULATED WORLD(Simulation Model)

© The AnyLogic Company | www.anylogic.com 5

Traffic Light Example

Eduardo GonzalezVP EngineeringSkymind

Samuel Audet Deep Learning EngineerSkymind

Tyler Wolfe-AdamTechnical Support Specialist The AnyLogic Company

© The AnyLogic Company | www.anylogic.com 6

Arriv

al ra

tes (

per h

our)

Time (seconds)

Traffic Light Example

Cars enter the intersection from 4 directions and move towards the opposing side.

The objective of the training experiment is to learn a policy optimally controls the traffic light based on current status of the traffic.

N

S

W E

© The AnyLogic Company | www.anylogic.com 7

Implementation Architecture

© The AnyLogic Company | www.anylogic.com 8

Implementation Architecture

AnyLogic Model

Imported RL4J library

Custom Experiment

© The AnyLogic Company | www.anylogic.com 9

What is inside the Custom experiment?

Hyperparameters

Network configuration

Training

© The AnyLogic Company | www.anylogic.com 10

What is inside the Custom experiment?

Network configuration

10

300 300

2

Input

Hidden 1 Hidden 2

Output

© The AnyLogic Company | www.anylogic.com 11

What is inside the Custom experiment?

Network configuration

© The AnyLogic Company | www.anylogic.com 12

What is inside the Custom experiment?

Network configuration

Training

© The AnyLogic Company | www.anylogic.com 13

What is inside the Custom experiment?

© The AnyLogic Company | www.anylogic.com 14

What is inside the Custom experiment?

Array with 10 elements

12

34

56

87

9

© The AnyLogic Company | www.anylogic.com 15

What is inside the Custom experiment?

© The AnyLogic Company | www.anylogic.com 16

What is inside the Custom experiment?

Action == 0: do nothingAction == 1: change the traffic

light phase if not yellow

© The AnyLogic Company | www.anylogic.com 17

Comparison of results (Optimized vs. Policy)

© The AnyLogic Company | www.anylogic.com 18

© The AnyLogic Company | www.anylogic.com 19

Comparison of results (Base vs. Optimized vs. Policy)

Real systems: Dynamic + Stochastic (exogenous inputs / system internals)

Optimization: Optimal fixed input parameters

Policy: Optimal (or near-optimal) decisions over time

© The AnyLogic Company | www.anylogic.com 20

Reinforcement learning decision points

Hyperparameters Observation Space

Action SpaceReward

© The AnyLogic Company | www.anylogic.com 21

Trained policies can be deployed in all types of devices and equipments to adaptively and autonomously complete some tasks.

How are learned policies used?

Edge devices could be used as controllersto deploy the learned policies.

© The AnyLogic Company | www.anylogic.com 22

Export model and text file

Test Export File Format

Export AnyLogic Model to Train

© The AnyLogic Company | www.anylogic.com 23

Add model into Skymind intelligence layer

.jar File Transfer

Create Experiment

Ready-to-Use Machine Learning Notebooks, Libraries, and Workflows

© The AnyLogic Company | www.anylogic.com 24

Train Model

Notebook Integration

Web or Command Line Interface

Compute and Storage Resource Management

Analytics

© The AnyLogic Company | www.anylogic.com 25

Deploy Model

Ready-to-Use Deployment Workflow

Multiple Model Language Support: Java, Python, Endpoints, RPA

© The AnyLogic Company | www.anylogic.com 26

Manage history and versions

Version History with Rollback

© The AnyLogic Company | www.anylogic.com 27

Machine Learning powered by Skymind

http://www.skymind.ai/anylogic

© The AnyLogic Company | www.anylogic.com 28

• The great news for simulation modelers is that their skills have a new and exciting application now!

• To implement a reinforcement learning (or DRL) a team of DRL expert(s) + simulation modeler(s) can collaborate. In theory, it is not necessary for each team to have an in-depth knowledge of the other group’s tasks.

• In developing simulation models that are going to be used as training environments, the stakes are higher because the human buffer is no longer there.

What should simulation modelers know about this new application?

© The AnyLogic Company | www.anylogic.com 29

At least in near future, there is NO way to automate the process of abstracting reality into a simulation model because it has two aspects that [current] machines are not good at:

The process of abstracting reality is an art Simulation models are fundamentally based on uncovering causality and how something works

Can simulation modelers’ jobs be replaced with AI too?

© The AnyLogic Company | www.anylogic.com 30

AnyLogic-AI integration roadmap

April 2019DL4J and RL4J librariesSKIL & AnyLogic

Summer 2019AnyLogic Cloud Python API (RL ready)

June 2019RL capabilities for the current AnyLogic Cloud Java APIDRL examples with instructions

end of 2019AL- AI book (first draft)

We are here now

• Integration with other AI platforms

• DRL in the Cloud (DRL experiment)• AL Python API (AnyLogic 9)

• Providing DRL-compatible example models

Fall 2019Preset learning algorithm/architectures in SKIL

© The AnyLogic Company | www.anylogic.com 31

thank you!