112
Computational aspects of reinforcement learning – [email protected] iuri frosio, Jul 6-17 th , 2018 [email protected] COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNING

COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

Com

puta

tional asp

ects

of

rein

forc

em

ent

learn

ing –

ifro

sio@

nvid

ia.c

om

iuri frosio, Jul 6-17th, 2018

[email protected]

COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNING

Page 2: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

2

Computational aspects of reinforcement learning – [email protected]

2003-2006

2003-2014

2014-…

Page 3: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

3

INTRODUCTION TO REINFORCEMENT LEARNING:CAN A BIOMEDICAL ENGINEER LAND A ROCKET?

Video from https://www.youtube.com/watch?v=u0-pfzKbh2k

Page 4: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

4

Computational aspects of reinforcement learning – [email protected]

LEARNING TO LAND A ROCKET?From bottle experiments to simulation

My attempt Problem Solution

Rocket model Bottle Not so realistic Simulation

Training

procedureTrial and error Time consuming Parallelize (GPU)

How to improve

Experience

(learn motion

pattern)

Slow, no hints

about the

correct pattern

Gradient on the

proper cost

function

Page 5: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

5

AGENDA

1. GPU-based A3C for Deep Reinforcement Learning(The basics of Reinforcement Learning on a CPU/GPU)

2. Cule*: GPU accelerated RL

(Moving Reinforcement Learning on a GPU)

3. Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand

Observations and Continuous Control

(Imitation Learning and Sample Efficiency)

4. Conclusion

Page 6: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

Com

puta

tional asp

ects

of

rein

forc

em

ent

learn

ing –

ifro

sio@

nvid

ia.c

om

M. Babaeizadeh†,‡, I.Frosio‡, S.Tyree‡, J. Clemons‡, J.Kautz‡

†University of Illinois at Urbana-Champaign, USA ‡ NVIDIA, USA

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

An ICLR 2017 paper

A github project

Page 7: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

7

GPU-BASED A3C FORDEEP REINFORCEMENT

LEARNING

Page 8: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

8

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

Learning to accomplish a task

Image from www.33rdsquare.com

Page 9: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

9

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

Definitions

St

at = p(St)

~Rt

StRL agent

✓ Environment

✓ Agent

✓ Observable status St

✓ Reward Rt

✓ Action at

✓ Policy at = p(St)

, Rt

Page 10: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

10

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

Definitions

St, Rt

at = p(St)

~Rt

StDeep RL agent

Page 11: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

11

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

Definitions

St, Rt

at = p(St)

Dp(∙)

R0 R1 R2 R3 R4

~Rt

St

Page 12: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

12

GPU-BASED A3C FORDEEP REINFORCEMENT

LEARNING

Page 13: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

13

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

Asynchronous Advantage Actor-Critic (Mnih et al., arXiv:1602.01783v2, 2015)

Dp(∙)

p’(∙)Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Page 14: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

14

GPU-BASED A3C FORDEEP REINFORCEMENT

LEARNING

Page 15: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

15

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

The GPU

Page 16: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

16

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

LOW OCCUPANCY

LOW OCCUPANCY (33%)

Page 17: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

17

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

HIGH OCCUPANCY (100%)

Batc

h s

ize

Page 18: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

18

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

HIGH OCCUPANCY (100%), LOW UTILIZATION (40%)

time time

Page 19: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

19

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

HIGH OCCUPANCY (100%), HIGH UTILIZATION (100%)

timetime

Page 20: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

20

Computational aspects of reinforcement learning – [email protected]

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

BANDWIDTH LIMITED

timetime

Page 21: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

21

Computational aspects of reinforcement learning – [email protected]

+

Dp(∙)

p’(∙)Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Page 22: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

22

Computational aspects of reinforcement learning – [email protected]

REGRESSION, CLASSIFICATION, …

MAPPING DEEP PROBLEMS TO A GPU

REINFORCEMENT LEARNING

Pear, pear, pear, pear, …

Empty, empty, …

Fig, fig, fig, fig, fig, fig,

Strawberry, Strawberry,

Strawberry, …

data

labels

100% utilization / occupancy

status,

reward

action

?

Page 23: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

23

Computational aspects of reinforcement learning – [email protected]

A3C

Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Page 24: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

24

Computational aspects of reinforcement learning – [email protected]

A3C

Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Page 25: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

25

Computational aspects of reinforcement learning – [email protected]

A3C

Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Page 26: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

26

Computational aspects of reinforcement learning – [email protected]

A3C

Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Computational aspects of reinforcement learning – [email protected]

Page 27: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

27

Computational aspects of reinforcement learning – [email protected]

A3C

Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Dp(∙)

Page 28: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

28

Computational aspects of reinforcement learning – [email protected]

A3C

Dp(∙)

p’(∙)Master

model

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

St, Rt

R0 R1 R2 R3 R4at = p(St)

Agent

1A

gent

2A

gent

16

Page 29: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

29

Computational aspects of reinforcement learning – [email protected]

GPU-based A3C

GA3C

El Capitan big wall, Yosemite Valley

Page 30: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

30

Computational aspects of reinforcement learning – [email protected]

GA3C (INFERENCE)

Master

model

… …

St

predictors

{St}

prediction queue

Agent

1A

gent

2

Agent

N

{at}

at

Page 31: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

31

Computational aspects of reinforcement learning – [email protected]

GA3C (TRAINING)

… …

R0 R1 R2

R4

trainers

{St,Rt}

training queue

Dp(∙)

Agent

1A

gent

2

Agent

N

St,Rt

Master

model

Page 32: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

32

Computational aspects of reinforcement learning – [email protected]

GA3C

Master

model

… …

St

predictors

{St}

prediction queue

Agent

1A

gent

2

Agent

N

{at}

at

… …

R0 R1 R2

R4

trainers

{St,Rt}

training queue

Dp(∙)

Page 33: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

33

Computational aspects of reinforcement learning – [email protected]

GPU-based A3C

GA3C

El Capitan big wall, Yosemite Valley

Page 34: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

34

Computational aspects of reinforcement learning – [email protected]

GPU-based A3C

GA3C

El Capitan big wall, Yosemite Valley

Learn how to balance

Page 35: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

35

Computational aspects of reinforcement learning – [email protected]

GA3C: PREDICTIONS PER SECOND (PPS)

Master

model

… …

St

predictors

{St}

prediction queue

Agent

1A

gent

2

Agent

N

{at}

at

… …

R0 R1 R2

R4

trainers

{St,Rt}

training queue

Dp(∙)

Page 36: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

36

Computational aspects of reinforcement learning – [email protected]

GA3C: TRAININGS PER SECOND (TPS)

Master

model

… …

St

predictors

{St}

prediction queue

Agent

1A

gent

2

Agent

N

{at}

at

… …

R0 R1 R2

R4

trainers

{St,Rt}

training queue

Dp(∙)

Page 37: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

37

Computational aspects of reinforcement learning – [email protected]

AUTOMATIC SCHEDULINGBalancing the system at run time

ATARI Boxing ATARI Pong

NP = # predictors, NT = # trainers, NA = # agents, TPS = training per seconds

Page 38: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

38

Computational aspects of reinforcement learning – [email protected]

THE ADVANTAGE OF SPEEDMore frames = faster convergence

Page 39: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

39

Computational aspects of reinforcement learning – [email protected]

LARGER DNNS

A3C (ATARI) Others (robotics)

Timothy P. Lillicrap et al., Continuouscontrol with deep reinforcement learning,International Conference on LearningRepresentations, 2016.

S. Levine et al., End-to-end training of deepvisuomotor policies, Journal of MachineLearning Research, 17:1-40, 2016.

For real world applications (e.g. robotics, automotive)

Conv 16

8x8

filters,

stride 4

Conv 32

4x4

filters,

stride 2

FC 256

Conv 32

8x8

filters,

stride 1,

2, 3, 4

Conv 32

4x4

filters,

stride 2

FC 256

Conv 64

4x4

filters,

stride 2 ?

Page 40: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

40

Computational aspects of reinforcement learning – [email protected]

GA3C VS. A3C*: PREDICTIONS PER SECONDS* Our Tensor Flow implementation on a CPU

1

10

100

1000

10000

Small DNN Large DNN,stride 4

Large DNN,stride 3

Large DNN,stride 2

Large DNN,stride 1

4x 11x 12x20x

45x

PP

S

A3C

GA3C

Page 41: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

41

Computational aspects of reinforcement learning – [email protected]

CPU & GPU UTILIZATION IN GA3CFor larger DNNs

0

10

20

30

40

50

60

70

80

90

100

Small DNN Large DNN,stride 4

Large DNN,stride 3

Large DNN,stride 2

Large DNN,stride 1

Uti

lizat

ion

(%

)

CPU %

GPU %

Page 42: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

42

Computational aspects of reinforcement learning – [email protected]

Master

model

… …

St

predictors

{St}

prediction queue

Agent

1A

gent

2

Agent

N

{at}

… …

R0 R1 R2

R4

trainers

{St,Rt}

training queue

Dp(∙)

GA3C POLICY LAGAsynchronous playing and training (off-policy updates)

DNN A

DNN B

Page 43: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

43

Computational aspects of reinforcement learning – [email protected]

STABILITY AND CONVERGENCE SPEEDReducing policy lag through min training batch size

Page 44: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

44

Computational aspects of reinforcement learning – [email protected]

GPU-based A3C

GA3C (45x faster)

El Capitan big wall, Yosemite Valley

Balancing computational

resources, speed and

stability.

Page 45: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

45

Computational aspects of reinforcement learning – [email protected]

THEORY

RESOURCES

M. Babaeizadeh, I. Frosio, S. Tyree, J.Clemons, J. Kautz, ReinforcementLearning through AsynchronousAdvantage Actor-Critic on a GPU,ICLR 2017 (available athttps://openreview.net/forum?id=r1VGvBcxl&noteId=r1VGvBcxl).

CODING

GA3C, a GPU implementation of A3C (open source at https://github.com/NVlabs/GA3C).

A general architecture to generate and consume training data.

Page 46: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

46

AGENDA

1. GPU-based A3C for Deep Reinforcement Learning(The basics of Reinforcement Learning on a CPU/GPU)

2. Cule*: GPU accelerated RL

(Moving Reinforcement Learning on a GPU)

3. Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand

Observations and Continuous Control

(Imitation Learning and Sample Efficiency)

4. Conclusion

Page 47: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

Com

puta

tional asp

ects

of

rein

forc

em

ent

learn

ing –

ifro

sio@

nvid

ia.c

om

* CUDA Learning Environment

Steven Dalton, Iuri Frosio, Jared Hoberock, Jason Clemons

CULE*: GPU ACCELERATED RL

Reinforcement Learning

time

Page 48: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

48

Computational aspects of reinforcement learning – [email protected]

REINFORCEMENT LEARNINGALE (Atari Learning Environment)

• Diverse set of tasks

• Established benchmark – MNIST of RL?

Page 49: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

49

Computational aspects of reinforcement learning – [email protected]

CULECUDA Learning Environment

LEARNING

ALGO

CULE

Frames production / consumption rate > 10K / sDemocratize RL: more frames for less money

Page 50: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

50

Computational aspects of reinforcement learning – [email protected]

AGENDA

RL training: CPU, GPU

Limitations

CuLE

Performance

Analysis and new scenarios

Page 51: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

51

Computational aspects of reinforcement learning – [email protected]

RL TRAININGThe OpenAI ATARI interface

https://github.com/openai/atari-py (OpenAI gym)

Page 52: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

52

Computational aspects of reinforcement learning – [email protected]

RL TRAININGCPU only based training

DQN A3C…

Mnih V. et al., Human-level control through deep reinforcement Learning, Nature, 2015Minh V. et al., Asynchronous Methods for Deep Reinforcement Learning, ICML 2016

Page 53: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

53

Computational aspects of reinforcement learning – [email protected]

RL TRAINING

1

10

100

1000

10000

Small DNN Large DNN,stride 4

Large DNN,stride 3

Large DNN,stride 2

Large DNN,stride 1

4x 11x 12x 20x45x

PP

S A3C

GA3C

Hybrid CPU GPU training

DQN A3C…

Babaeizadeh M. et al., Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU, ICLR, 2017

GA3C…

Page 54: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

54

Computational aspects of reinforcement learning – [email protected]

RL TRAININGClusters

DQN A3C…

Espeholt L. et al., IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures, 2018

GA3C…

Cluster

ES (GA)A3CA2CIMPALA…

Page 55: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

55

Computational aspects of reinforcement learning – [email protected]

RL TRAININGDGX-1

DQN A3C…

Stoole A., Abbeel P., Accelerated Methods for Deep Reinforcement Learning, 2018

GA3C…

Cluster

ES (GA)A3CA2CIMPALA…

DGX-1

Policy gradientQ-value…

Page 56: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

56

Computational aspects of reinforcement learning – [email protected]

RL TRAINING

TIME

Limitations

DQN A3C…

GA3C…

Page 57: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

57

Computational aspects of reinforcement learning – [email protected]

RL TRAINING

$$$

Limitations

Cluster

ES (GA)A3CA2CIMPALA…

DGX-1

Policy gradientQ-value…

Page 58: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

58

Computational aspects of reinforcement learning – [email protected]

CULECUDA Learning Environment

LEARNING

ALGO

CULE

Frames production / consumption rate > 10K / sDemocratize RL: more frames for less money

Page 59: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

59

Computational aspects of reinforcement learning – [email protected]

AGENDA

RL training: CPU, GPU

Limitations

CuLE

Performance

Analysis and new scenarios

Page 60: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

60

Computational aspects of reinforcement learning – [email protected]

RL TRAINING (CPU SIMULATION)Standard training scenario

States, rewards

Actions (, weights)

Update

s

Page 61: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

61

Computational aspects of reinforcement learning – [email protected]

States, rewards

Actions (, weights)

Update

s

RL TRAINING (CPU SIMULATION)Standard training scenario

Page 62: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

62

Computational aspects of reinforcement learning – [email protected]

RL TRAINING (CPU SIMULATION)Standard training scenario

States, rewards

Actions (, weights)

Update

s

Limited bandwidth

Page 63: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

63

Computational aspects of reinforcement learning – [email protected]

RL TRAINING (CPU SIMULATION)Standard training scenario

States, rewards

Actions (, weights)

Update

s

Limited bandwidth

Limited number of CPUs, low frames / second

Page 64: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

64

Computational aspects of reinforcement learning – [email protected]

RL TRAINING (CULE)Porting ATARI to the GPU

States, rewards

Actions (, weights)

Update

s

Page 65: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

65

Computational aspects of reinforcement learning – [email protected]

RL TRAINING (GPU)1-to-1 mapping of ALEs to threads

ALE ATARI simulator

Page 66: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

66

Computational aspects of reinforcement learning – [email protected]

AGENDA

RL training: CPU, GPU

Limitations

CuLE

Performance

Analysis and new scenarios

Page 67: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

67

Computational aspects of reinforcement learning – [email protected]

GYM COMPATIBLE (MOSTLY)

for agent in (0, agents):action.cpu() # transfer to CPUobservation, reward, done, info = env.step(action.numpy()) # executeobservation.cuda() # transfer back GPU

reward.cuda()

# parallel call to all agentsobservations, rewards, dones, infos = env.step(actions) # execute

AtariPy

CuLE

Page 68: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

68

Computational aspects of reinforcement learning – [email protected]

FRAMES PER SECONDBreakout, inference only (no training)

1 environment 1024 environments 4096 environments 32768 environments

GPU occupancy

Page 69: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

69

Computational aspects of reinforcement learning – [email protected]

GYM COMPATIBLE (MOSTLY)

for agent in (0, agents):action.cpu() # transfer to CPUobservation, reward, done, info = env.step( action.numpy()) # executeobservation.cuda() # transfer back GPU

reward.cuda()

train()

# parallel call to all agentsobservations, rewards, dones, infos = env.step(actions) # execute

train()

AtariPy

CuLE

Page 70: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

70

Computational aspects of reinforcement learning – [email protected]

REINFORCEMENT LEARNINGBreakout – A2C (preliminary result)

Page 71: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

71

Computational aspects of reinforcement learning – [email protected]

AGENDA

RL training: CPU, GPU

Limitations

CuLE

Performance

Analysis and new scenarios

Page 72: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

72

Computational aspects of reinforcement learning – [email protected]

TRADE-OFFSame amount of time: CuLE vs. non CuLE

Fra

mes

Agents

10 ~ 100 agents

update 1

update 2

update 3

update 4

update 5

update 6

CULE

update 1

Traditional approach

Bandwidth vs. Latency

1,000 ~ 100,000 agents

Page 73: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

73

Computational aspects of reinforcement learning – [email protected]

TRADE-OFFSame amount of time: CuLE vs. non CuLE

Fra

mes

Agents

10 ~ 100 agents

update 1

update 2

update 3

update 4

update 5

update 6

CULE

update 1

Traditional approach

Bandwidth vs. Latency

update 2update 3

1,000 ~ 100,000 agents

Page 74: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

74

Computational aspects of reinforcement learning – [email protected]

TRADE-OFFSame amount of time: CuLE vs. non CuLE

Fra

mes

Agents

10 ~ 100 agents

update 1

update 2

update 3

update 4

update 5

update 6

CULE

update 1

Traditional approach

Bandwidth vs. Latency

update 2update 3

1,000 ~ 100,000 agents

Page 75: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

75

Computational aspects of reinforcement learning – [email protected]

GYM COMPATIBLE (MOSTLY)

for time in (0, np.inf):action.cpu() # transfer to CPUobservation, reward, done, info = env.step( action.numpy()) # executecpu_state = cule.get_state() # get state

train()

# seed, cule::set_state(gpuState, cpuState)env.seed(cpu_state, first_agent = 0, last_agent = 100)

# parallel call to all agentsobservations, rewards, dones, infos = env.step(actions) # execute

# …

AtariPy / CuLE

CuLE

Page 76: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

76

Computational aspects of reinforcement learning – [email protected]

SEEDINGSame amount of time: CuLE vs. non CuLE

Fra

mes

Agents

10 ~ 100 agents

seed 1

seed 2

seed 3

seed 4

seed 5

seed 6

CULE

update 1

Traditional approach

Bandwidth vs. Latency

1,000 ~ 100,000 agents

Page 77: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

77

Computational aspects of reinforcement learning – [email protected]

CONCLUSIONCuLE

More frames for less money (democratizing RL)

New scenarios

How to use large batches?

Seeding from the CPU, ES, …

Soon released on https://github.com/NVlabs/

Page 78: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

78

Computational aspects of reinforcement learning – [email protected]

THEORY

RESOURCES

S. Dalton, I. Frosio, J. Hoberok, J.Clemons, CULE, a companion libraryfor accelerated RL training, GTC 2018,(http://on-demand.gputechconf.com/gtc/2018/presentation/s8440-cule-a-companion-library-for-accelerated-rl-training.pdf).

CODING

Releasing soon:

https://github.com/NVlabs

Page 79: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

79

AGENDA

1. GPU-based A3C for Deep Reinforcement Learning(The basics of Reinforcement Learning on a CPU/GPU)

2. Cule*: GPU accelerated RL

(Moving Reinforcement Learning on a GPU)

3. Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand

Observations and Continuous Control

(Imitation Learning and Sample Efficiency)

4. Conclusion

Page 80: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

Com

puta

tional asp

ects

of

rein

forc

em

ent

learn

ing –

ifro

sio@

nvid

ia.c

om

Mengyuan Yan, Iuri Frosio, Stephen Tyree, Jan Kautz

SIM-TO-REAL TRANSFER OF ACCURATE GRASPING WITH EYE-IN-HAND OBSERVATIONS AND CONTINUOUS CONTROL

Page 81: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

81

Computational aspects of reinforcement learning – [email protected]

ROBOTS

Page 82: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

82

Computational aspects of reinforcement learning – [email protected]

ROBOTS GOING INTO THE WILD

Page 83: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

83

Computational aspects of reinforcement learning – [email protected]

CHALLENGES & SOLUTION

Challenge Solution

Safety Simulation

Data collection Simulation

Sample efficiency Imitation Learning

Simulation ☺ Domain transfer

Page 84: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

84

Computational aspects of reinforcement learning – [email protected]

THE PROBLEM OF GRASPING“Elementary” thus significant

Page 85: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

85

Computational aspects of reinforcement learning – [email protected]

LEARNING AND TOOLS

1) Visual input from wrist camera

2) Closed-loop continuous position control

3) Use imitation learning (on a GPU) to learn in simulation (Gazebo)

4) Visual domain transfer to real environments

1

4

DNN2

3

Gazebo

GPU

Close loop controller

Real environment

Domain transfer

Page 86: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

86

Computational aspects of reinforcement learning – [email protected]

LEARNING AND TOOLS

1) Visual input from wrist camera

2) Closed-loop continuous position control

3) Use imitation learning (on a GPU) to learn in simulation (Gazebo)

4) Visual domain transfer to real environments

1

4

DNN2

3

Gazebo

GPU

Close loop controller

Real environment

Domain transfer

Page 87: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

87

Computational aspects of reinforcement learning – [email protected]

CONTROLLER AND VISUAL DOMAIN TRANSFER

87

Page 88: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

88

Computational aspects of reinforcement learning – [email protected]

LEARNING AND TOOLS

1) Visual input from wrist camera

2) Closed-loop continuous position control

3) Use imitation learning (on a GPU) to learn in simulation (Gazebo)

4) Visual domain transfer to real environments

1

4

DNN2

3

Gazebo

GPU

Close loop controller

Real environment

Domain transfer

Page 89: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

89

Computational aspects of reinforcement learning – [email protected]

IMITATION LEARNING

89

The expert is notnecessarily solving ourproblem.

It can solve the problemby having a completeaccess to the status ofthe system, which is notavailable to our agent!

Page 90: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

90

Computational aspects of reinforcement learning – [email protected]

COVARIANCE SHIFT

90

Page 91: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

91

Computational aspects of reinforcement learning – [email protected]

A Reduction of Imitation Learning and Structured Prediction to

No-Regret Online Learning.

Stéphane Ross Geoffrey J. Gordon J. Andrew Bagnell91

Page 92: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

92

Computational aspects of reinforcement learning – [email protected]

92

Page 93: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

93

Computational aspects of reinforcement learning – [email protected]

93

Page 94: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

94

Computational aspects of reinforcement learning – [email protected]

IMITATION LEARNING VS REINFORCEMENT LEARNING

Imitation learning pros and cons

+ Sample efficient (data reuse)

+ Less time spent for exploration

+ Faster convergence

- Need for an expert

- Lower generalization capabilities

- Optimize metric?

Pros and cons

94

Page 95: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

95

Computational aspects of reinforcement learning – [email protected]

FSM EXPERT

Reset Approach Grasp

Page 96: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

96

Computational aspects of reinforcement learning – [email protected]

DAGGER: LEARNING CURVE

96

Page 97: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

97

Computational aspects of reinforcement learning – [email protected]

LEARNING AND TOOLS

1) Visual input from wrist camera

2) Closed-loop continuous position control

3) Use imitation learning (on a GPU) to learn in simulation (Gazebo)

4) Visual domain transfer to real environments

1

4

DNN2

3

Gazebo

GPU

Close loop controller

Real environment

Domain transfer

Page 98: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

98

Computational aspects of reinforcement learning – [email protected]

TRANSFERRING TO THE REAL ROBOTDomain randomization

Real image Synthetic image

Tobin, Josh et al. “Domain randomization for transferring

deep neural networks from simulation to the real world.” 2017

IEEE/RSJ International Conference on Intelligent Robots and

Systems (IROS) (2017): 23-30.

Page 99: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

99

Computational aspects of reinforcement learning – [email protected]

CONTROLLER AND VISUAL DOMAIN TRANSFER

99

Page 100: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

100

Computational aspects of reinforcement learning – [email protected]

SIM-REAL COMPOSITION

100

Random real scene (no sphere)

Randomize simulated scene GT binary mask

Training image

Page 101: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

101

Computational aspects of reinforcement learning – [email protected]

VISION MODULE RESULTS

101

TP / (TP + FP)% of sphere pixels that are real sphere

pixels

TP / (TP + FN)% of sphere pixels that are captured

Page 102: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

102

Computational aspects of reinforcement learning – [email protected]

VISION MODULE RESULTS

102

Page 103: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

103

Computational aspects of reinforcement learning – [email protected]

CONCLUSION

90% success in grasping a tiny sphere

Modular network for increased interpretability and flexible training

Imitation learning (DAGGER) fordata-efficient controller learning

Sim-real composition as a form of randomization for visual transfer

103

Page 104: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

104

Computational aspects of reinforcement learning – [email protected]

THEORY

RESOURCES

Mengyuan Yan, Iuri Frosio, StephenTyree, Jan Kautz, Sim-to-Real Transferof Accurate Grasping with Eye-In-HandObservations and Continuous Control,Neural Information Processing Systems(NIPS) 2017 Workshop on Acting andInteracting in the Real World:Challenges in Robot Learning,https://arxiv.org/abs/1712.03303.

CODING

DIY ☺

Gazebo: http://gazebosim.org/

ISAAC:https://www.nvidia.com/en-us/deep-learning-ai/industries/robotics/

Page 105: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

105

AGENDA

1. GPU-based A3C for Deep Reinforcement Learning(The basics of Reinforcement Learning on a CPU/GPU)

2. Cule*: GPU accelerated RL

(Moving Reinforcement Learning on a GPU)

3. Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand

Observations and Continuous Control

(Imitation Learning and Sample Efficiency)

4. Conclusion

Page 106: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

106

Computational aspects of reinforcement learning – [email protected]

FOUR LESSONS

✓ Balancing computational resources at system level

✓ Move simulation to GPU, create new algorithms

✓ Imitation learning & sample efficiency

✓ Domain transfer

106

Page 107: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

107

Computational aspects of reinforcement learning – [email protected]

ACKNOWLEDGEMENTS

Stephen Tyree

NVIDIAJan Kautz

NVIDIA

Jeannette Bohg

Stanford

Mengyuan Yan

NVIDIA /

Stanford

Mohammad

Babaizadeh

NVIDIA / UIUC

Jason

Clemons

NVIDIA

Steven Dalton

NVIDIA

Jarod

Hoberock

NVIDIA

Page 108: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

108

Computational aspects of reinforcement learning – [email protected]

LINKS: REFERENCES

NVIDIA Research homepage: https://www.nvidia.com/en-us/research/

Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz, ReinforcementLearning through Asynchronous Advantage Actor-Critic on a GPU, ICLR 2017,https://openreview.net/forum?id=r1VGvBcxl.

GA3C on github: https://github.com/NVlabs/GA3C

S. Dalton, I. Frosio, J. Hoberock, J. Clemons, CULE: GPU ACCELERATED RL, GTC 2018, http://on-demand.gputechconf.com/gtc/2018/presentation/s8440-cule-a-companion-library-for-accelerated-rl-training.pdf.

CuLE on github (released soon): https://github.com/NVlabs/.

Mengyuan Yan, Iuri Frosio, Stephen Tyree, Jan Kautz, Sim-to-Real Transfer of Accurate Grasping withEye-In-Hand Observations and Continuous Control, Neural Information Processing Systems (NIPS) 2017Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning,https://arxiv.org/abs/1712.03303. 108

Page 109: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

109

Computational aspects of reinforcement learning – [email protected]

LINKS: ISAAC

https://nvidianews.nvidia.com/news/nvidia-isaac-launches-new-era-of-autonomous-machines

Jetson Xavier:… At the heart of NVIDIA Isaac is Jetson™ Xavier™, the world’s first computer designed specifically for robotics…

Isaac Robotics Software:

Isaac SDK – a collection of APIs and tools to develop robotics algorithm software and runtime framework with fully accelerated libraries.

Isaac IMX – Isaac Intelligent Machine Acceleration applications, a collection of NVIDIA-developed robotics algorithm software.

Isaac Sim – a highly realistic virtual simulation environment for developers to train autonomous machines and perform hardware-in-the-loop testing with Jetson Xavier.

109

Page 110: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

110

Computational aspects of reinforcement learning – [email protected]

LINKS: INTERNSHIP

https://www.nvidia.com/en-us/research/internships/

Apply here: http://www.nvidia.com/object/universityrecruiting-internships.html

Or more easily, send me your CV to [email protected]!

Typical length: 3/4 months in Silicon Valley

Typical outcome: 1 publication, 1 patent

Typical team: 1 intern, 1 mentor (senior research scientist), several co-mentors

110

Page 111: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

111

Computational aspects of reinforcement learning – [email protected]

LINKS: GRADUATE FELLOWSHIP

http://research.nvidia.com/graduate-fellowships

… $50,000 per award to Ph.D. students who are researching topics that will lead to major advances in the graphics and high-performance computing industries, and are investigating innovative ways of leveraging the power of the GPU. NVIDIA particularly invites submissions from students pushing the envelope in artificial intelligence, deep neural networks, autonomous vehicles, and related fields…

… Students must have already completed their first year of Ph.D. level studies (at the time of application)…

… Students must have majors in Computer Science, Computer Engineering, System Architecture, Electrical Engineering, or a related area…

111

Page 112: COMPUTATIONAL ASPECTS OF REINFORCEMENT LEARNINGprofs.sci.univr.it/~castella/allegati/2018.07... · M. Babaeizadeh†,‡, I.Frosio ‡, S.Tyree‡, J. Clemons , J.Kautz‡ †University

112

Computational aspects of reinforcement learning – [email protected]

LINKS: “FREE” GPUS & TEACHING MATERIAL

“Free” GPUs for researchers (not really free, but it is easy to get one): https://developer.nvidia.com/academic_gpu_seeding

Free teaching material and GPU cloud resources through the Deep Learning Institute (Deep Learning, Accelerated Computing, & Robotics): https://developer.nvidia.com/teaching-kits

112