78
Departure MANagement with a Reinforcement Learning Approach Respecting CFMU Slots Ivomar Brito Soares, Yann-Micha¨ el De Hauwere, Kris Januarius, Tim Brys, Thierry Salvant, Ann Now´ e [email protected] - [email protected] ITSC 2015 September 2015

Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Embed Size (px)

Citation preview

Page 1: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Departure MANagement with a ReinforcementLearning ApproachRespecting CFMU Slots

Ivomar Brito Soares, Yann-Michael De Hauwere, KrisJanuarius, Tim Brys, Thierry Salvant, Ann Nowe

[email protected] - [email protected]

ITSC 2015September 2015

Page 2: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Summary

Introduction

Reinforcement Learning

Departure MANanagement

DMAN RL Model

Experiments

Conclusions

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 2

Page 3: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Large Scale Multi-Agent Systems (MAS)

Smart Energy Grids Intelligent Traffic Systems

Warehouse PlanningAir Traffic System

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 3

Page 4: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Large Scale MAS

Characteristics

I Resources and control are inherently distributed.

I Different actors with mutual and conflicting interests.

I Highly stochastic.

I Full system dynamics is unknown (e.g. not all constraints areknown).

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 4

Page 5: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Large Scale MAS

Characteristics

I Resources and control are inherently distributed.

I Different actors with mutual and conflicting interests.

I Highly stochastic.

I Full system dynamics is unknown (e.g. not all constraints areknown).

Full mathematical description or a global centralized solution aredifficult to be calculated.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 4

Page 6: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Summary

Introduction

Reinforcement Learning

Departure MANanagement

DMAN RL Model

Experiments

Conclusions

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 5

Page 7: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Why use RL?

I Artificial Intelligence (AI)I Machine Learning (ML)

I Reinforcement Learning (RL)

Reinforcement Learning

I Agent based modelling effort is reduced.I Exceeds human controller performance.I Adaptive and can change its decisions dynamically.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 6

Page 8: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Single-Agent RL

Single-Agent Reinforcement Learning (RL) Model[Kelbling et al. (1996)]

I Approach to solve a Markov Decision Process (MDP).I Agent must learn by itself (trial and error).I Maximize a long term numerical reward signal.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 7

Page 9: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Multi-Agent RL

Multi-Agent Reinforcement Learning (MARL)[Nowe (2011)]

I Markov Game (MG).

I Multiple agents / multiple sequential decisions.

I More complex than Single-Agent RL.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 8

Page 10: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Summary

Introduction

Reinforcement Learning

Departure MANanagement

DMAN RL Model

Experiments

Conclusions

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 9

Page 11: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Air Traffic System

I Air Traffic System (ATS)I Airport Ground Operations (AGO)

I Departure MANagement (DMAN)

Organize movement of departing aircraft from gate to take-offclearance.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 10

Page 12: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Air Traffic System

I Air Traffic System (ATS)I Airport Ground Operations (AGO)

I Departure MANagement (DMAN)

Organize movement of departing aircraft from gate to take-offclearance.

DMAN Tasks

I Respect assigned Target Take-Off Time Windows(TTOTW).

I Increase of runway throughput and airport capacity.

I Reduce fuel consumtion, noise and CO2 emissions.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 10

Page 13: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Target Take-Off Time Window (TTOTW)

Respect TTOW allows for a better usage of the ATSinfra-structure (CFMU Slots ≡Take-off Time Slots).

Flight Plan CallsignTarget Take-Off Time Window (TTOTW)

TTOTWmin TTOT TTOTWmax

AAL0005D 07:19:12 07:20:12 07:21:12

DAL0067D 07:22:12 07:23:12 07:24:12

JBU0065D 07:25:12 07:26:12 07:27:12

AAL0007D 07:28:12 07:29:12 07:30:12

DAL0009D 07:31:12 07:32:12 07:33:12Some TTOT windows generated for learning scenario 6

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 11

Page 14: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Target Take-Off Time Window (TTOTW)

Respect TTOW allows for a better usage of the ATSinfra-structure (CFMU Slots ≡Take-off Time Slots).

Flight Plan CallsignTarget Take-Off Time Window (TTOTW)

TTOTWmin TTOT TTOTWmax

AAL0005D 07:19:12 07:20:12 07:21:12

DAL0067D 07:22:12 07:23:12 07:24:12

JBU0065D 07:25:12 07:26:12 07:27:12

AAL0007D 07:28:12 07:29:12 07:30:12

DAL0009D 07:31:12 07:32:12 07:33:12Some TTOT windows generated for learning scenario 6

In Charles de Gaulle Airport (LFPG) in Paris, France, roughly80% of the flights succeed in taking off inside their slots[Gotteland, 2003] .

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 11

Page 15: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Human Controllers Approaches to Respecting TTOTW

1. Gate Controllers: Clear the aircraft off-block at TTOT -estimation of time duration between off-block and take-off.

I Average: Average of all departure aircraft of KJFK.I Exact: When it taxis alone.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 12

Page 16: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Human Controllers Approaches to Respecting TTOTW

1. Gate Controllers: Clear the aircraft off-block at TTOT -estimation of time duration between off-block and take-off.

I Average: Average of all departure aircraft of KJFK.I Exact: When it taxis alone.

2. Runway Controllers: Make it wait before lining up, if itestimates that it will miss its window by taking-off too early.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 12

Page 17: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Fast Time Simulation (FTS)

I Modeling: EnRoute flight phase, Terminal ManeuveringArea (TMA), aircraft and airport handlers, vehicles groundmovements.

I Simulation clock runs faster than a regular clock (fast-time).

I AirTOp, SIMMOD, TAAM, Arc Port ALTO etc.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 13

Page 18: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

John F. Kennedy International Airport (KJFK)

GridWorldEnvironment

New York City, USA (NY-Metro)

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 14

Page 19: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Summary

Introduction

Reinforcement Learning

Departure MANanagement

DMAN RL Model

Experiments

Conclusions

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 15

Page 20: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Environment

I Fast Time Simulation: AirTOp.

I Departure Flights: Off-Block → Pushing-Back → Taxiing →Runway acceleration → Take-Off → Standard InstrumentDeparture (SID) → EnRoute.

I Arrival Flights: EnRoute → Standard Terminal ArrivalRoute (STAR) → Land → Runway decceleration → Taxiing→ In-Block.

I Safety requirements: wake-vortex separations, runway usage.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 16

Page 21: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Agent: A FP controller agent.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 17

Page 22: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Agent: A FP controller agent.I States:

1. Parked (initial state): one per departure aircraft with a TTOTwindow.

I Departure GateI Entry Time at departure gate.

2. Taxiing (intermediate states) (finite):I Entry NodeI Exit NodeI Entry Time at entry node.

3. Taken-Off (goal/absorbing states):I Taken-Off Inside WindowI Taken-Off Outside Window

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 17

Page 23: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Actions:

1. Delay Off-Block.2. Delay During Taxiing.3. Take-Off

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 18

Page 24: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Actions:

1. Delay Off-Block.2. Delay During Taxiing.3. Take-Off

I Reward Function:I Inside Window: Positive reward penalizing delay on ground.I Outside Window: 0.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 18

Page 25: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):

I Parked State / Delay Off-Block actions.

I Delay is absorbed at the gate →reduced fuelconsumption.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19

Page 26: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):

I Parked State / Delay Off-Block actions.

I Delay is absorbed at the gate →reduced fuelconsumption.

2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19

Page 27: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):

I Parked State / Delay Off-Block actions.

I Delay is absorbed at the gate →reduced fuelconsumption.

2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.

3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:

I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19

Page 28: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):

I Parked State / Delay Off-Block actions.

I Delay is absorbed at the gate →reduced fuelconsumption.

2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.

3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:

I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.

4. Multi-Agent Multi-State: Multiple Multi-State agentslearning in a shared environment.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19

Page 29: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Single-Agent Multi-State Example

I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.

I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.

Terminal

a

b c

d 27

09

Parked State Taxiing States Taken-Off States

08:00:00 08:00:30

08:00:00

08:01:00

08:05:30

08:05:00

08:06:00

08:06:30

08:07:00

08:10:30

08:10:00

08:11:00

08:11:30

08:12:00

08:12:30

08:13:00

Taken-OffOutsideWindow

Taken-OffInside

Window

gd (a)

ne (a), nx (b)

ne (b), nx (c)

ne (c), nx (d)

1. Aircraft parks at the gate at 08:00:00.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20

Page 30: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Single-Agent Multi-State Example

I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.

I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.

Terminal

a

b c

d 27

09

Parked State Taxiing States Taken-Off States

08:00:00 08:00:30

08:00:00

08:01:00

08:05:30

08:05:00

08:06:00

08:06:30

08:07:00

08:10:30

08:10:00

08:11:00

08:11:30

08:12:00

08:12:30

08:13:00

Taken-OffOutsideWindow

Taken-OffInside

Window

gd (a)

ne (a), nx (b)

ne (b), nx (c)

ne (c), nx (d)

1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,

Q=0.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20

Page 31: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Single-Agent Multi-State Example

I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.

I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.

Terminal

a

b c

d 27

09

Parked State Taxiing States Taken-Off States

08:00:00 08:00:30

08:00:00

08:01:00

08:05:30

08:05:00

08:06:00

08:06:30

08:07:00

08:10:30

08:10:00

08:11:00

08:11:30

08:12:00

08:12:30

08:13:00

Taken-OffOutsideWindow

Taken-OffInside

Window

gd (a)

ne (a), nx (b)

ne (b), nx (c)

ne (c), nx (d)

1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,

Q=0.3. No stop at node b. R=0, Q=0.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20

Page 32: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Single-Agent Multi-State Example

I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.

I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.

Terminal

a

b c

d 27

09

Parked State Taxiing States Taken-Off States

08:00:00 08:00:30

08:00:00

08:01:00

08:05:30

08:05:00

08:06:00

08:06:30

08:07:00

08:10:30

08:10:00

08:11:00

08:11:30

08:12:00

08:12:30

08:13:00

Taken-OffOutsideWindow

Taken-OffInside

Window

gd (a)

ne (a), nx (b)

ne (b), nx (c)

ne (c), nx (d)

1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,

Q=0.3. No stop at node b. R=0, Q=0.4. Aircraft stops for 30sec at node c. R=0,

Q=0.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20

Page 33: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Single-Agent Multi-State Example

I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.

I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.

Terminal

a

b c

d 27

09

Parked State Taxiing States Taken-Off States

08:00:00 08:00:30

08:00:00

08:01:00

08:05:30

08:05:00

08:06:00

08:06:30

08:07:00

08:10:30

08:10:00

08:11:00

08:11:30

08:12:00

08:12:30

08:13:00

Taken-OffOutsideWindow

Taken-OffInside

Window

gd (a)

ne (a), nx (b)

ne (b), nx (c)

ne (c), nx (d)

1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,

Q=0.3. No stop at node b. R=0, Q=0.4. Aircraft stops for 30sec at node c. R=0,

Q=0.5. Aircraft takes-off (ATOT) at 08:16:00.

rTTOTW ,ini =100-0.5 * 60 = 70. Q =

0.2*70=14.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20

Page 34: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Summary

Introduction

Reinforcement Learning

Departure MANanagement

DMAN RL Model

Experiments

Conclusions

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 21

Page 35: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Learning Problem Size

Large Multi-Agent System (MAS)

I Number of departure flights (agents): 698

I Number of arrival flights: 711

I Two days of operations of KJFK

Independent Learning Scenarios

I Total: 42

Scenario 6-38 5 0 39 1,3,4 2,41 40

# of Dep 20 13 8 6 3 1 0

# of AT MA MA MA MA MA SA -Number of departure flights per learning scenario index

(Dep: Departure Flights, AT: Agents Type, MA: Multi-Agent, SA: Single-Agent).

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 22

Page 36: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Learning Scenario 6: Single-State Multi-State Comparison

I Deterministic environment

I 20 independent learning agents

Average # of TTOTW in Average fuel consumption departureflights (Kg)

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 23

Page 37: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Learning Scenario 6: Single-State Multi-State Comparison

I Deterministic environment

I 20 independent learning agents

Average reward (all agents) Average reward (per agent)

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 24

Page 38: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Learning Scenario 6: Single-State Multi-State Comparison

I Deterministic environment

I 20 independent learning agents

Average gate delay (all agents) Average taxiing delay (all agents)

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 25

Page 39: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Percentage of Windows Respected

Percentage of Windows Respected

CaseEnvironment

Deterministic StochasticMachine Learning 99 96

Gate ControllersAverage 85 71Exact 97 44

Gate + RunwayControllers

Average 87 70Exact 96 44

Percentage of windows respected for all scenarios

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 26

Page 40: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Fuel Consumption

Fuel Consumption (kg)

CaseEnvironment

Deterministic StochasticMachine Learning 34,806 37,989

Gate ControllersAverage 35,057 38,106Exact 34,839 37,865

Gate + RunwayControllers

Average 35,412 36,613Exact 34,847 37,872

Fuel consumption for departure aircraft for all scenarios

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 27

Page 41: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Summary

Introduction

Reinforcement Learning

Departure MANanagement

DMAN RL Model

Experiments

Conclusions

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 28

Page 42: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Conclusions

I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.

I Realistic real world applications of RL.

I Single-State

I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.

I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.

I Multi-State

I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.

I Disadvantages: Increased fuel consumption. Bigger learningproblem.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29

Page 43: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Conclusions

I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.

I Realistic real world applications of RL.

I Single-State

I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.

I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.

I Multi-State

I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.

I Disadvantages: Increased fuel consumption. Bigger learningproblem.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29

Page 44: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Conclusions

I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.

I Realistic real world applications of RL.

I Single-State

I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.

I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.

I Multi-State

I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.

I Disadvantages: Increased fuel consumption. Bigger learningproblem.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29

Page 45: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Questions?

Ivomar Brito Soares

[email protected] - [email protected]

AI Lab (VUB): https://ai.vub.ac.be

Airtopsoft SA: http://www.airtopsoft.

com

Innoviris: http://www.innoviris.be

Youtube Channel:RLDMAN: http://www.youtube.com/

channel/UC8uJBsMej5A1as8trbVxbbQ.Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 30

Page 46: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

References

1. A. G. Barto, Reinforcement Learning: An Introduction.MIT press, 1998.

2. K. Tumer and A. Agogino, Improving Air TrafficManagement with a Learning Multiagent System, IEEEIntelligent Systems, vol. 24, no. 1, pp. 1821, 2009.

3. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell,Machine learning: An Artificial Intelligence Approach.Springer Science Business Media, 2013.

4. A. Nowe, P. Vrancx, and Y.-M. De Hauwere, Game Theoryand Multi-agent Reinforcement Learning. Springer, 2012,ch. Reinforcement Learning: State of the Art, pp. 441470.

5. Y.-M. De Hauwere, Sparse Interactions in Multi-AgentReinforcement Learning, 2011.

6. R. De Neufville and A. Odoni, Airport Systems: Planning,Design and Management, 2013.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 31

Page 47: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

References

7. S. Stroiney, B. Levy, and C. Knickerbocker, DepartureManagement: Savings in Taxi Time, Fuel Burn, andEmissions, in Integrated Communications Navigation andSurveillance Conference (ICNS). IEEE, 2010, pp. J217.

8. J.-B. Gotteland, N. Durand, and J.-M. Alliot, HandlingCFMS Slots in Busy Airports, in 5th USA/Europe AirTraffic Management Research and Development Seminar,2003.

9. Airtopsoft, Airtop Fast Time Simulatorhttp://www.airtopsoft.com/, 2005, [Online; accessed23-February-2015].

10. C. J. Watkins and P. Dayan, Q-Learning, Machine learning,vol. 8, no. 3-4, pp. 279292, 1992.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 32

Page 48: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Abstract

This paper considers how existing Reinforcement Learning (RL) techniques can be used to model and learnsolutions for large scale Multi-Agent Systems (MAS). The large scale MAS of interest is the context of themovement of departure flights in big airports, commonly known as the Departure MANagement (DMAN)problem. A particular DMAN subproblem is how to respect Central Flow Management Unit (CFMU) take-off timewindows, which are time windows planned by flow management authorities to be respected for the take-off time ofdeparture flights. A RL model to handle this problem is proposed including the Markov Decision Process (MDP)definition, the behavior of the learning agents and how the problem can be modeled using RL ranging from thesimplest to the full RL problem. Several experiments are also shown that illustrate the performance of the machinelearning algorithm, with a comparison on how these problems are commonly handled by airport controllersnowadays. The environment in which the agents learn is provided by the Fast Time Simulator (FTS) AirTOp andthe airport case study is the John F. Kennedy International Airport (KJFK) in New York City, USA, one of thebusiest airports in the world.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 33

Page 49: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 50: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I S = {s1, ..., sN} is a finite set of states.I A = {a1, ..., ak} are the actions available to the agent.I Tsa: Each combination of starting state si , action choice

al ∈ A and next state sj has an associated transitionprobability T (si , al , sj).

I R = Immediate reward R(si , al).I γ ∈ [0, 1) is the discount factor (e.g., 0.9).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 51: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 52: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I E : Expected value.

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 53: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 54: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I s ′: Next state.I a′: Action taken on next state.

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 55: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 56: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I α: Learning rate.

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 57: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 58: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Reinforcement Learning (RL) Overview

I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).

I Learn Policy (π): Jπ ≡ E [∑∞

t=0 γtR(s(t), π(s(t)))]

I Q values: Q(s, a) = R(s, a) + γ∑

s′ T (s, a, s ′) maxa′ Q(s ′, a′)

I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]

I ε-greedy action selection mechanism:

ε (episode) = ε0 ∗ τ episode

I ε0 to be the initial value (e.g., 1.0).I τ ∈ (0, 1) the decay (e.g.,0.995, 1348 episodes).

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34

Page 59: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Algorithm 1 RL Update With Q-Learning and ε-Greedy withDecay

1: Initialize Q(s, a) (e.g., 0)2: for Each episode do3: Agent returns to initial state.4: Decay ε: ε← ε0 ∗ τ episode5: while Final/absorbing state not reached do6: Generate random number n ∈ [0, 1):7: if n ≤ ε then8: Explore: Choose a at random9: else

10: Exploit: Choose among a with the highest Q

11: Execute action a12: Observe reward r , next state s

13: Update Q of a: Q(s, a) ← Q(s, a) + αt [R(s, a) +γmaxa′ Q(s ′, a′)− Q(s, a)] [Watkins, 1992]Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 35

Page 60: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Target Take-Off Time Window (TTOTW)

Respect TTOW allows for a better usage of the ATSinfra-structure.

I Departure aircraft i : acdepi , acdep,TTOTi .

I TTOT Window (TTOTW):

I Width: TTOTW wi > 0.

I Range: [TTOTWmini , TTOTmax

i ].I Constraints: TTOTWmin

i < TTOTi < TTOTWmaxi and

TTOTWmaxi = TTOTWmin

i + TTOTW wi .

I TTOTWi respected: Actual Take-Off Time (ATOT):ATOTi ∈ [TTOTWmin

i ,TTOTWmaxi ].

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 36

Page 61: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Target Take-Off Time Window (TTOTW)

Respect TTOW allows for a better usage of the ATSinfra-structure.

I Departure aircraft i : acdepi , acdep,TTOTi .

I TTOT Window (TTOTW):

I Width: TTOTW wi > 0.

I Range: [TTOTWmini , TTOTmax

i ].I Constraints: TTOTWmin

i < TTOTi < TTOTWmaxi and

TTOTWmaxi = TTOTWmin

i + TTOTW wi .

I TTOTWi respected: Actual Take-Off Time (ATOT):ATOTi ∈ [TTOTWmin

i ,TTOTWmaxi ].

In Charles de Gaulle Airport (LFPG) in Paris, France, roughly80% of the flights succeed in taking off inside their slots[Gotteland, 2003] .

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 36

Page 62: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Human Controllers Approaches to Respecting TTOTW

1. Gate Controllers: Clear the aircraft off-block at TTOT -T oti .

I Average: T ot,ai =

Average push back duration (00:02:35)+ Average taxi time duration (total taxi length / 14kt)+ Average runway line up duration (00:00:28)+ Runway acceleration duration.

I Exact: T ot,ei =

Actual Take-Off Time (ATOT)- Actual Off-Block Time (AOBT).

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 37

Page 63: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Human Controllers Approaches to Respecting TTOTW

1. Gate Controllers: Clear the aircraft off-block at TTOT -T oti .

I Average: T ot,ai =

Average push back duration (00:02:35)+ Average taxi time duration (total taxi length / 14kt)+ Average runway line up duration (00:00:28)+ Runway acceleration duration.

I Exact: T ot,ei =

Actual Take-Off Time (ATOT)- Actual Off-Block Time (AOBT).

2. Runway Controllers: Make it wait before lining up, if itestimates that it will miss its window by taking-off too early.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 37

Page 64: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state

considered is the Parked State with the DelayOff-Block actions.

I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38

Page 65: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state

considered is the Parked State with the DelayOff-Block actions.

I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.

2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38

Page 66: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state

considered is the Parked State with the DelayOff-Block actions.

I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.

2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.

3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:

I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38

Page 67: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State

1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state

considered is the Parked State with the DelayOff-Block actions.

I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.

2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.

3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:

I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.

4. Multi-Agent Multi-State: Multiple Multi-State agentslearning in a shared environment.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38

Page 68: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Agent: A FP controller agent ai .

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 39

Page 69: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Agent: A FP controller agent ai .I States (S):

1. Parked (initial state) (spi ):I Departure Gate (gd)I Entry Time (ep)

2. Taxiing (intermediate states) (S ti,j = sti,1, ..., s

ti,N):

I Entry Node (ne)I Exit Node (nx)I Entry Time (et)

3. Taken-Off (goal/absorbing states):I Taken-Off Inside Window (sTTOTW ,in

i )I Taken-Off Outside Window (sTTOTW ,out

i )

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 39

Page 70: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Actions (A):

1. Delay Off-Block (Ao = ao1 , ..., aoL)

2. Delay During Taxiing (At = at1, ..., atM)

3. Take-Off (ae)

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 40

Page 71: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

Markov Decision Process (MDP)

I Actions (A):

1. Delay Off-Block (Ao = ao1 , ..., aoL)

2. Delay During Taxiing (At = at1, ..., atM)

3. Take-Off (ae)

I Reward Function (R):I Inside Window:

rTTOTW ,ini = rmax − ptaxiingi = rmax − f taxiing ∗ d taxiing

i

I Outside Window: rTTOTW ,outi = 0.

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 40

Page 72: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RL Set Up

I Independent learners.

I Q-learning: α = 0.2, γ = 0.8.

I Action selection mechanism is ε-greedy with parameter decay:ε0 = 1.0, τ = 0.995.

I Competitive setting: rmax = 10, 000, f taxiing = −1.0.

I Environment: deterministic, stochastic (non-deterministic).

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 41

Page 73: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

RL Set Up

I Delay off-block actions: Ao with a range of [−10min, 10min]centered around TTOT − T ot,e and a step of 10sec for everyagent (121 actions per spi ).

I Delay during taxiing actions: At were defined with a range of[0, 1min] and a step of 10sec (7 actions per S t

i ) close to eachapron exit.

I Learning trial ends when ε = 0.001 for all agents (1378episodes).

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 42

Page 74: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Percentage of Windows Respected

Percentage of KJFK departure flights that take off inside window

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 43

Page 75: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Percentage of Windows Respected

Percentage of KJFK departure flights that take off inside window

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 44

Page 76: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Fuel Consumption

Percentage of KJFK departure flights that take off inside window

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 45

Page 77: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Fuel Consumption

Percentage of KJFK departure flights that take off inside window

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 46

Page 78: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots

Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions

All Learning Scenarios: Fuel Consumption

Percentage of KJFK departure flights that take off inside window

Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 47