35
Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007 Presenter : Andrew Collins [email protected] Supervisor : Prof Lyn Thomas

Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Learning to Price Airline Seats Under Competition

7th Annual INFORMS Revenue Management and Pricing Conference

Barcelona, SpainThursday, 28th June 2007

Presenter: Andrew Collins [email protected]

Supervisor: Prof Lyn Thomas

Page 2: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 2

Overview

• Motivation

• Reinforcement Learning

• Methodology

• Model

• Results

• Conclusions

Page 3: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 3

Motivation

Game Theory project manager:• 2001- 2004• Defence Science and Technology Laboratories (Dstl), UK

Research at University of Southampton:

“To demonstrate Game Theory as a practical analytical modelling technique for usage within the OR community”

Frustration with Game Theory:• Difficulty with deriving feasible solutions

• Difficulty in validating results (due to simplifications)

• Dependency on input variables

• Speed and Memory issues of running models

Applications of Game Theory in Defence Project- A. Collins, F. Pullum, L. Kenyon (2003) [Dstl - Unclassified]

Page 4: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 4

Learning in Games• Brown’s Fictitious play (1951)• Fudenberg and Levine (1998)

Evolutionary• Weibull (1995)• Replicator Dynamics

Neural Networks• Just a statistical process in the limit

– Neal (1996)

Reinforcement Learning• Association with Psychology

– Palvov (1927)– Rescorla and Wagner (1972)

• Convergence– Collins and Leslie (2005)

Theory of Learning in Games- Drew Fudenberg and David Levine (1998)

Page 5: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Reinforcement Learning

Introduction

CB

A

Page 6: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 6

Reinforcement Learning (RL)A.K.A. ‘Neuro-Dynamic Programming’ or ‘Approximate

Dynamic Programming’.

Agents/players reinforce their world-view from interaction with the environment.

Agent

Environment

Action

“a”

Reward

“r”State

“s”

Neuro-Dynamic Programming- Dimitri Bertsekas and John Tsitsiklis (1996)

CB

A

Page 7: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 7

Types

Type Update ‘U’

Monte Carlo R(next s)MC

Q-Learning maxaQ(a, next s)QL

SARSA Q(next a, next s)SA

• Players store information about each state-action pair (called the Q-value)• They use this information to select an action when at that state• They updated this information depending on a rule:

‘U’ depends on the RL type used. It usually involves the: • Observed return ‘R’, post current state• Current Q-value estimates of proceeding states (called bootstrapping)

Reinforcement Learning- Richard Sutton and Andrew Barto (1998)

Q(a, s) = (1 - ).Q(a, s) + .(reward(a, s) + U(next s))

Page 8: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 8

IssuesHow do players select their actions?• Exploration vs. exploitation• Boltzmann Action Selection (a.k.a. Softmax)

– Similar to Logit Models

Stochastic Uncoupled Dynamics and Nash Equilibrium- Sergiu Hart and Andreu Mas-Colell (2006)

Actionsb

bQ

actionQ

e

eactionP

/)(

/)(

)(

Leads to Nash distribution:– Unique Nash Equilibrium as 0

Uncoupled Games• Hart and Mas-Colell (2003, 2006)

Page 9: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Methodology

Introduction

Page 10: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 10

Methodology

Construct a simple AIRLINE pricing model– Dynamic Pricing of Airline Tickets with Competition

• Currie, Cheng and Smith (2005)

– Reinforcement Learning Approach to Airline Seat Allocation• Gosavi, Bandla and Das (2002)

Run various Reinforcement Learning (RL) models– Compare to ‘optimal’ solutions– Prove RL converges using Stochastic Approximation

Analyse the optimal solution for the model– Find optimal solution using Dynamic Programming – Deduce generalisation from these results

Tools for Thinking: Modelling in Management Science- Mike Pidd (1996)

Page 11: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 11

Episode Generation

Players learn about

Environment

Policy updated

Repeat

Reinforcement Learning

Start

Backward Induction

Dynamic Programming

Optimal Policy

Airline Pricing Model: Flow Diagram

Compare

Page 12: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Airline Pricing Model

Introduction to an

Airline Pricing Model

Page 13: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 13

Airline Pricing Model

The game consists of two competing airline firms. The firms are ‘P1’ and ‘P2’.

• Each firm is selling seats for a single leg flight

• Both flights are identical• Firms attract customers with their prices

A separate model is used for customer demand.

The Theory and Practice of Revenue Management- K. Talluri and G. van Ryzin (2004)

Page 14: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 14

EndRound 1

Simple Airline Pricing Model

P1Price

Change

FlightsLeave

P1 sets Price

P2 sets Price

EndRound 2

Customer- Lowest Price

P2Price

Change

Page 15: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Airline Pricing Model

Solution Example to Simple Airline Model

Page 16: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 16

Solution Example

1 2 1 2

P1 9

P2

R1

R2

1+1=?

8

9

8

8

0

8

Page 17: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 17

Solution Example

1 2 1 2

P1

P2 10

R1

R2

1+1=?

Player ‘1’ can now attempt to attract one or both of

the remaining customers. However, player ‘2’ still

has a chance to undercut to gain the last customer.

Page 18: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 18

Solution Example

1 2 1 2

P1

P2 10

R1

R2

1+1=?

9 9

10

9

88

9

8

9

8

Page 19: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 19

Solution Example

1 2 1 2

P1 5 5 9 9 9

P2 10 10 10 8 8

R1 5 9 14R2 8 8

1+1=?

Page 20: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Comparison

Using metrics to compare policies

Page 21: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 21

ComparisonsOnce we have learned a policy, how do you compare

policies?• Q-values or action probabilities

– Difficultly in weighting states

What I really care about is return, so:• Compare return from each path

– Curse of dimensionality

• Produce the Return Probability Distribution (RPD) of the different policies played against some standard policies:– Nash distribution, Nash equilibrium, myopic play, random play, etc.– Would need to compare ALL possibilities to be sure of

convergence

Page 22: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 22

Nash Equilibrium: Derived from play of (5, 10, 9,8).

The BLUE bar are for P2 and RED for P1.

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

P1

P2

Page 23: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 23

Nash Distribution : = 0.0020.

Difference so small that you not notice them here.

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

Page 24: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 24

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

Nash Distribution : = 0.0050.

We can now see a slight change in the distribution

Page 25: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 25

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

Nash Distribution : = 0.0100.

Notice there is more variation for P2 than P1.

Page 26: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 26

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

Nash Distribution : = 0.0200.

Notice that P1 is observing some very bad results.

Page 27: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 27

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

Nash Distribution : = 0.2000.

Almost get random play (see next slide).

Page 28: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 28

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Payoff

Pro

bab

ilit

y

Random Play: Notice that the expected rewards are even as it does not matter the order of players.

Page 29: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 29

Metrics

If a policy is very similar to a another policy, we would expect to see similar RPD from both policies, when played against the standard policies.

How do we compare RPD?• L1-metric meaningless….• Hellinger, Kolmogorov-Smirov, Gini, Information

value, Separation, Total Variation, Mean, Chi-squared…

On Choosing and Bounding Probability Metrics-Alison Gibbs and Francis Su (2002)

Page 30: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 30

Example Metric Results

Metric comparison of the RPDs of:

1) Nash Equilibrium policy vs. SARSA learnt policy

2) Nash Equilibrium policy vs. Nash Equilibrium policy

Greedy action selection used for calculating RPD.

The x-axis is a log-scale of episodes.

10M episodes run in total.

1+1=?

IV

KS

KS_P1

KS_P2

ROC

ROC_P1

ROC_P2

SD1

SD2

CHI1

CHI2 TV

E1 E2 H

Page 31: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Reinforcement Learning Model

Results

CB

A

Page 32: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 32

Tau variation

0

0.2

0.4

0.6

0.8

1

0 0.005 0.01 0.015 0.02 0.025 0.03Exploration

Ko

lmo

go

rov

-Sm

irn

ov QL

MC

SARSA

Results compare learning policy’s RPD to corresponding Nash Distribution policy’s RPD.

MC seems to improve as exploration increases. Why not increase exploration?

Page 33: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 33

Other Issues1) Stability• Excess exploration implies instability• Higher dependency on the most recent observation implies instability

2) Computing• Batch runs: 100 x 10M episodes• 2.2Ghz 4Gb RAM• Time considerations

– 23 hrs• Memory requirements

– 300MB

3) Curse of Dimensionality• Wish to increase number of rounds

4) Customer Behaviour• Wish to change customer behaviour (i.e. multiple customers, Logit models)

Simulation-Based Optimization-Abhijit Gosavi (2003)

Page 34: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

[email protected] 34

Conclusions

A simple airline pricing model can lead to some interesting results. Understanding the meaning of these results might give insight into real-world pricing policy.

By trying to use the RL algorithm to solve this model, interesting behaviour is observed. – Curse of Dimensionality– Stability

SARSA RL method out performs other methods for certain exploration levels.

Page 35: Learning to Price Airline Seats Under Competition 7th Annual INFORMS Revenue Management and Pricing Conference Barcelona, Spain Thursday, 28 th June 2007

Questions?Andrew Collins

[email protected]