Download pdf - Attention, Learn to Solve Routing Problems! Wouter Kool ......2019/03/11 · 2 5/24/19 Attention, Learn to Solve Routing Problems! Travelling Scientist Problem (TSP) International

Attention, Learn to Solve Routing Problems!

Wouter Kool, Herke van Hoof, Max Welling

5/24/192 Attention, Learn to Solve Routing Problems!

Travelling Scientist Problem (TSP)

InternationalConference on

LearningRepresentations2019

TSP* is (NP-)hard! * Travelling Salesman Problem (TSP)


What does it mean?

Finding optimal solutions for all problem instances

Finding acceptable solutions for relevant problem instances

* unless P = NP

‘next location should be nearby’We use HEURISTICSCan be seen as ‘rules of thumb’

MISSION:

IMPOSSIBLE

MISSION:

IMPOSSIBLE


Crafting of heuristics is similar to feature engineering

So what do we do?

HARD WORK Feature engineering

• Needs expert knowledge• Time consuming hand-tuning

Computer Vision Features(SIFT, etc.)


Crafting of heuristics is similar to feature engineering

Two eyes?

Nose?

Mouth?

It’s a face! It’s a face!

Traditional approachFeature engineering

Deep LearningNo feature engineering


Back to our problem

Recurrent NeuralNetworks


‘Translate’ problem into solution

Je suis une personneI am a person

(𝑥#, 𝑦#) (𝑥', 𝑦') (𝑥(, 𝑦() (𝑥), 𝑦)) (𝑥#, 𝑦#) (𝑥', 𝑦')(𝑥(, 𝑦() (𝑥), 𝑦))

FrenchDictionary

Idea


How does that work?

Sample 𝜋#~𝑝𝜽 𝜋# 𝑠) Sample 𝜋'~𝑝𝜽 𝜋' 𝑠, 𝜋#)

Instance 𝑠 =( 𝑥#, 𝑦# , 𝑥', 𝑦' , … , 𝑥1, 𝑦1 )

Solution 𝝅 = 𝜋#, 𝜋', …with length 𝐿 𝝅

Model 𝑝𝜽 𝜋4 𝑠, 𝜋54)=𝑝𝜽(next node | partial tour)

𝐸7𝜽(𝝅|9) 𝐿 𝝅How to optimize 𝜽?

Randomized algorithmwith expected cost:

Sample 𝜋4~𝑝𝜽 𝜋4 𝑠, 𝜋54)


Do something

Result = ?

Do more often! Do less often!

Sample 𝝅 ∼ 𝑝𝜽(; |𝑠)

𝐿 𝝅 = 7.43

Good! Bad!

Increase 𝑝𝜽(𝝅|𝑠) Decrease 𝑝𝜽(𝝅|𝑠)

We need a baselineto compare against: rollout earlier model

Repeat

𝝅@A ∼ 𝑝𝜽BC(; |𝑠) (greedy!)

𝐿 𝝅@A = 6.89

REINFORCE (for dummies)


What’s the model architecture?

𝑝𝜽 𝜋4 𝑠, 𝜋54)

Graph convolutions

+

Read the paper…


Experiments

Travelling Salesman Problem (TSP)

(Stochastic) Prize Collecting TSP ((S)PCTSP)

Vehicle Routing Problem (VRP)

Orienteering Problem (OP)

Minimize lengthVisit all nodes

Maximize total prizeMax length constraint

Minimize length + penalties of unvisited nodesCollect minimum total prize

Minimize lengthVisit all nodesTotal route demand must fit vehicle capacity

Train for each problem, same hyperparameters!


Results Attention Model + Rollout Baseline

• Improves over classical heuristics!

• Improves over prior learned heuristics!• Attention Model improves• Rollout helps significantly

• Gets close to single-purpose SOTA (20 to 100 nodes)!• TSP 0.34% to 4.53% (greedy)• TSP 0.08% to 2.26% (best of 1280 samples)


The end!

Thank you for your attention!

• Learning algorithms for optimization problems is promising

• Especially for less well-studied problems

• Can deal with uncertainty, specialize to data distribution, etc.