25
Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction of Structures in a Dynamic Environment using Reinforcement Learning

Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

Embed Size (px)

Citation preview

Page 1: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

Sérgio Ronaldo Barros dos Santos (ITA-Brazil)Sidney Nascimento Givigi Júnior (RMC-Canada)

Cairo Lúcio Nascimento Júnior (ITA-Brazil)

Autonomous Construction of Structures in a Dynamic

Environment using Reinforcement Learning

Page 2: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

2/25

Introduction In recent years, there has been a growing interest in a class of applications in which mobile robots are used to assemble and build different types of structures.

These applications traditionally involves human performing:

The operation of tools and equipament;

The manipulation and transportation of the resources for manufacturing of structures; and

The careful preplanning of the tasks that will be executed.

Page 3: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

3/25

Due to recent advancements in technologies available for UAVs, the problem of autonomous manipulation, transportation and construction is advancing to the aerial domain.

Autonomous construction using aerial robots may be useful in several situations, such as:

Reduce the high accident rates in traditional construction;

Enable the construction in extraterrestrial environments or disaster areas; and

Use in military and logistics applications.

Introduction

Page 4: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

4/25

Quad-rotor Robot

All of the movements of quad-rotor can be controlled by the changes of each rotor speed.

An inertial frame and a body fixed frame whose origin is in the center of mass of the quad-rotor are used.

Page 5: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

5/25

Problem Statement Construction task using mobile robots are characterized by three fundamental problems such as task planning, motion planning and path tracking.

However, to obtain the task and path planning that define a specific sequence of operations for construction of different structures is generally very complex.

The task planning, motion planning and low-level controllers for robotic assembly are derived off-line through a simulation environment, using Reinforcement Learning (RL) and heuristic search (A*) algorithms, and then the solutions are ported to an actual quad-rotor.

Page 6: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

6/25

Problem Statement

Proposed Environment The 3-D Structures suggested

This work is concentrated on the learning of four different types of 3-D structures: cube, tower, pyramid and wall, similar to those used in construction of scaffolds, tower cranes, skyscrapers, etc.

Page 7: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

7/25

Proposed Solution

• Low-level controllers: Enable the position and path tracking control of the quad-rotor.

• Task planning: Provide the maneuvers and assembly sequence.

• Path Planning: Find the optimal path for the robot so that its navigation by the dynamic environment is executed.

Page 8: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

8/25

Experimental Infrastructure

Page 9: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

9/25

Experimental Infrastructure

Page 10: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

10/25

Reinforcement Learning The task planning and low-level controllers for robotic assembly were learned by a reinforcement learning algorithm known as Learning Automata (LA).

Page 11: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

11/25

Learning Control of a Quad-rotor The low-level controllers are adapted simultaneously, (attitude and height) and after (position and path tracking) using the nonlinear dynamics model of the target quad-rotor built for the X-Plane Flight Simulator and also the LA algorithm executing in Matlab.

Page 12: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

12/25

Learning Control of a Quad-rotor Some considerations that must be taken into account during the learning phase, such as wind and ground effects, as well as the change of mass and center of gravity of the system produced by different types of payloads.

Page 13: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

13/25

Learning Control of a Quad-rotor A simulation setup is proposed to the training and evaluation of the control parameters under realistic conditions.

Page 14: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

14/25

Learning Control of a Quad-rotor Experimental setup used to test and validate the learned attitude and path tracking controllers in simulation.

Page 15: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

15/25

Learning Control of a Quad-rotor Path tracking and height responses obtained by the quad-rotor during the test of the adapted control laws .

Test in simulation Experimental Validation

Page 16: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

16/25

Learning of the Robotic Assembly The proposed learning system for the autonomous construction of structures. The training process of the task planning is accomplished by a team of automata.

Learning Architecture Learning Automata

Page 17: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

17/25

Learning of the Robotic Assembly

TT

TT

M

j d

dd

M

j r

yR

M

j xz

xzT

M

j c

cc

TT

V

kjDw

V

kjRw

V

kjDw

V

kjDw

MnJ

11

11

],[],[

..],[],[

1

)()(

)()(

min nJnJ

nJnJnI

TT

TTd

The proposed total cost function to evaluate the structure construction mode is given by:

The numeric value of the response quality obtained by the robot during each iteration is computed using:

Page 18: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

18/25

Learning of the Robotic Assembly

1)(

1)(122

)(

1)(

nIifR

nIifRRRR

nI

nIifR

nR

dP

dpGpG

d

dG

c

errorassemblyif

nRif

nRifR

nR c

cc

20

0)(0

0)(

The value of Rc(n) ϵ [Rp, RG] is understood as the established limit by the user to change the speed convergence during the training process.

A common reinforcement is used to update the action probability distributions of the team of automata.

Page 19: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

19/25

Learning of the Robotic Assembly During the learning phase, it is noted that the acquired knowledge by the system relative to assembly of a 3-D structure (tower) increases with each iteration.

The learned sequence of maneuvers and assembly for construction of a tower are illustrated in plots below.

Page 20: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

20/25

Learning of the Robotic Assembly Experimental setup used to validate the learned task planning and the produced path planning by RL and A* algorithms, simultaneously.

Page 21: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

21/25

Learning of the Robotic Assembly The Executed events for the assembly task of a structure.

Page 22: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

22/25

Learning of the Robotic Assembly The resulting trajectory of the sequence of maneuvers learned for assembling the tower, through a quad-rotor was successfully performed.

Page 23: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

23/25

Conclusions This method allows the autonomous construction of multiple 3-D structures based on the Learning Automata and A* algorithms using a quad-rotor.

This approach reduces substantially the effort employed for developing a task and motion planning that permits a robot to efficiently assemble and construct multiple 3-D structures.

The use of reinforcement learning for finding different set of actions to build a 3-D structure is very promising.

Page 24: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

24/25

Conclusions

The proposed learning architecture enables an aerial robot to learn a good sequence of maneuvers and assembly so that the constraints inherent in the structures and environment are overcome.

It has been shown that a 3-D structure can be built using the adapted low-level controllers, the learned task planning and also the produced path planning.

Page 25: Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction

25/25

Thank you

Questions?