73
The Traveling Salesman Problem in Theory & Practice Lecture 1 21 January 2014 David S. Johnson [email protected] http:// davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays

The Traveling Salesman Problem in Theory & Practice

  • Upload
    poppy

  • View
    53

  • Download
    1

Embed Size (px)

DESCRIPTION

The Traveling Salesman Problem in Theory & Practice. Lecture 1 21 January 2014 David S. Johnson [email protected] http:// davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays. Today’s Outline. Requirements, References, & Introductions Problem Definition Applications - PowerPoint PPT Presentation

Citation preview

Page 1: The Traveling Salesman Problem in Theory & Practice

The Traveling Salesman Problem in Theory & Practice

Lecture 121 January 2014

David S. [email protected]

http://davidsjohnson.net

Seeley Mudd 523, Tuesdays and Fridays

Page 2: The Traveling Salesman Problem in Theory & Practice

Today’s Outline1. Requirements, References, & Introductions2. Problem Definition3. Applications4. Paths and Cycles5. Complexity6. Introduction to Optimization7. Introduction to Approximation8. Preview of the Rest of the course

Page 3: The Traveling Salesman Problem in Theory & Practice

Requirements and Grading• Class presentation of results from the

literature.

• Written paper:– Survey paper on an approved topic

– Report on your own new experimental work

– Theoretical paper on new results of your own

• Regular class participation.

Page 4: The Traveling Salesman Problem in Theory & Practice

About Me• Ph.D. in Mathematics from MIT (1973). Thesis: Near-

Optimal Bin Packing Algorithms.

• 40 years at AT&T (Bell Labs, AT&T Labs – Research), with one year off for good behavior (U. Wisconsin, 1980-81).

• Most famous publication: Computers and Intractiability: A Guide to the Theory of NP-Completeness, (1979, with Mike Garey).

• Many theoretical and experimental papers on the TSP with many co-authors, starting with the proof that the Euclidean version is NP-Hard.

Page 5: The Traveling Salesman Problem in Theory & Practice

Optional Reference BooksThe Traveling Salesman Problem, Lawler, Lenstra, Rinnooy Kan, and Shmoys (Editors), Wiley (1985). $377.47 (current amazon.com price, new)

The Traveling Salesman Problem and Its Variations, Gutin and Punnen (Editors), Kluwer (2002). $152.10

The Traveling Salesman Problem: A Computational Study, Applegate, Bixby, Chvatal, and Cook, Princeton University Press (2006). $57.99/$44.99 (Kindle)

In Pursuit of the Traveling Salesman, Cook, Princeton University Press (2012). $20.64/$15.37 (Kindle)

Page 6: The Traveling Salesman Problem in Theory & Practice

Web Resources• http://www.math.uwaterloo.ca/tsp/

“The Traveling Salesman Problem” (Bill Cook)

• http://dimacs.rutgers.edu/Challenges/TSP/ “The 8th DIMACS Implementation Challenge: The Traveling Salesman Problem” (DSJ)

• http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ “TSPLIB” (Testbed of Instances, Gerd Reinelt)

• http://davidsjohnson.net/papers.html (DSJ’s downloadable papers on the TSP and other topics)

• http://en.wikipedia.org/wiki/Travelling_salesman_problem (Wikipedia Entry -- Much Improved)

Page 7: The Traveling Salesman Problem in Theory & Practice

The Traveling Salesman Problem

Given:Set of cities {c1,c2,…,cN }. For each pair of cities {ci,cj}, a distance d(ci,cj).

Find: Permutation that

minimizes)c,d(c)c,d(c π(1)π(N)

1N

1i1)π(iπ(i)

N}{1,2,...,N}{1,2,...,:π

Page 8: The Traveling Salesman Problem in Theory & Practice

Alternative DefinitionGiven:

Graph G = (V,E)Length d(e) for each edge e in E.

Find: Minimum length Hamiltonian Circuit in the

complete graph G’ on V, where if {u,v} is not in E, we assume d(e) = ∞.

Page 9: The Traveling Salesman Problem in Theory & Practice

N = 10

Page 10: The Traveling Salesman Problem in Theory & Practice

N = 10

Page 11: The Traveling Salesman Problem in Theory & Practice

N = 100

Page 12: The Traveling Salesman Problem in Theory & Practice

N = 1000

Page 13: The Traveling Salesman Problem in Theory & Practice

N = 10000

Page 14: The Traveling Salesman Problem in Theory & Practice

Jan Karel Lenstra

Page 15: The Traveling Salesman Problem in Theory & Practice

Planar Euclidean Application #1

• Cities:

– Holes to be drilled in printed circuit boards

Page 16: The Traveling Salesman Problem in Theory & Practice
Page 17: The Traveling Salesman Problem in Theory & Practice
Page 18: The Traveling Salesman Problem in Theory & Practice

N = 10000

Page 19: The Traveling Salesman Problem in Theory & Practice

N = 2392

Page 20: The Traveling Salesman Problem in Theory & Practice

Planar Euclidean Application #2

• Cities:

– Wires to be cut in a “Laser Logic” programmable circuit

Page 21: The Traveling Salesman Problem in Theory & Practice
Page 22: The Traveling Salesman Problem in Theory & Practice

N = 7397

Page 23: The Traveling Salesman Problem in Theory & Practice

N = 33,810

Page 24: The Traveling Salesman Problem in Theory & Practice

N = 85,900

Page 25: The Traveling Salesman Problem in Theory & Practice

Other Types of Instances• X-ray crystallography

– Cities: orientations of a crystal– Distances: time for motors to rotate the crystal

from one orientation to the other

• High-definition video compression– Cities: binary vectors of length 64 identifying the

summands for a particular function– Distances: Hamming distance (the number of

terms that need to be added/subtracted to get the next sum)

Page 26: The Traveling Salesman Problem in Theory & Practice

Data Storage Layout

Goal: For each row, have as many consecutive entries as possible (minimizes the number of random accesses)

Page 27: The Traveling Salesman Problem in Theory & Practice

Asymmetric Applications• Payphone Money Collection with One-Way Streets• Stacker-Crane• No-Wait Flowshop• Disk Scheduling• Compiling to Minimize Branching Cost• Minimum Length Common Superstring

Page 28: The Traveling Salesman Problem in Theory & Practice

The Stacker Crane Problem

Page 29: The Traveling Salesman Problem in Theory & Practice

No-Wait FlowshopJob: Task on Processor

1Task on Processor 2

Schedule:

Processor 2

Processor 1

Page 30: The Traveling Salesman Problem in Theory & Practice

No-Wait Flowshop2

1 3

1

2

13

1

6

5

Page 31: The Traveling Salesman Problem in Theory & Practice

Disk Scheduling

Page 32: The Traveling Salesman Problem in Theory & Practice

Disk Scheduling

Locations of the fragments of a file one want to retrieveDistance between two fragments = time it takes to move the read head from the end of one to the beginning of the next, taking into account the spinning of the disk

Page 33: The Traveling Salesman Problem in Theory & Practice

Compiling to Minimize Branching Cost

Code Segment ending in a BranchIn execution, the delay at the end of the segment is much less if the next instruction to be executed is the next one in the code, say 1 versus k.

Based on profiling, one can determine the empirical probability that each branch is taken.

Following A directly by B causes an expected delay of PB + kPC. Following A directly by C causes an expected delay of PC + kPB. Following A directly by anything else causes an expected delay of k.

PB

PC

AC

B

Page 34: The Traveling Salesman Problem in Theory & Practice

Shortest Superstring• Given: Finite set of S strings over some

alphabet.• Find: Shortest string that contains all

strings in S as substrings.• Cities: Strings in S.• Distances: d(x,y) = |y| - maximum

overlap between a suffix of x and a prefix of y.

X = “alphabet”, y =“ betrayal” d(x,y) = 5 alphabet betrayal

d(y,x) = 6 betrayal alphabet

Page 35: The Traveling Salesman Problem in Theory & Practice

Hamiltonian Path versus Cycle

• Four variants (both for symmetric and asymmetric TSP).– Cycle– Path between between fixed endpoints– Path with fixed starting vertex– Path with unconstrained endpoints.

• A code for any one can be adapted to handle any of the others.

Page 36: The Traveling Salesman Problem in Theory & Practice

Path with Fixed Endpoints:Cycle via Path

st

Call Path algorithm once for s and each vertex t in V-{s}. Return result with best value of Path Length + dist(t,s)

Page 37: The Traveling Salesman Problem in Theory & Practice

Path with Fixed Endpoints:Path via Cycle

st

Add one new vertex and two new edges. Compute shortest cycle, then delete the added vertex and edges

Page 38: The Traveling Salesman Problem in Theory & Practice

Path with One Fixed Endpoint viaPath with Two Fixed Endpoints

s

For each t in V – {s}, find shortest Hamiltonian path from s to t. Return the best.

Page 39: The Traveling Salesman Problem in Theory & Practice

Path with Two Fixed Endpoints viaPath with One Fixed Endpoint

st

Add one new vertex t’ with an edge to t. The shortest Hamiltonian path starting with s must end at t’.

t’

Page 40: The Traveling Salesman Problem in Theory & Practice

Path with No Fixed Endpoints viaPath with One Fixed Endpoint

For each s in V, find shortest Hamiltonian path starting from s. Return the best.

Page 41: The Traveling Salesman Problem in Theory & Practice

Path with One Fixed Endpoint viaPath with No Fixed Endpoint

s

Add new vertex s’ and an edge from s’ to s.

s’

Page 42: The Traveling Salesman Problem in Theory & Practice

Directed via Undirected

v1in v1

out

v1

v2in v2

out

v2

v3in v3

out

v3

vNin vN

out

vN

Replace each vertex vi by a triplet of vertices viin,

vi, viout, and edges {vi

in,vi} and {vi,viout}

Replace each directed edge (vi,vj) by the undirected edge {vi

out,vjin}.

Page 43: The Traveling Salesman Problem in Theory & Practice

v3out

v1in

v1out

v1

v2in v2

out

v2

v3in

v3

v4in v4

out

v4

Page 44: The Traveling Salesman Problem in Theory & Practice

TSP: The Canonical NP-Hard Problem?

• Commonly used in the popular press to explain NP-completeness and exponential time to the layman: The number of tours grows as N! (actually (N-1)!/2 for symmetric case):

N # Tours N # Tours3 1 12 39,916,800

4 3 13 518,918,400

5 12 14 7,264,857,600

6 60 15 108,972,864,000

7 420 16 1,743,565,824,000

8 3,360 17 29,640,619,008,000

9 30,240 18 533,531,142,144,000

10 302,400 19 10,137,091,700,736,000

11 3,326,400 20 202,741,834,014,720,000

Page 45: The Traveling Salesman Problem in Theory & Practice

N! = Ω(2NlogN) time is not requiredO(N22N) suffices! [Bellman, 1963][Held & Karp, 1962]

Algorithmic technique: Dynamic ProgrammingStates: Pairs [U,j] with 2 ≤ j ≤ N and {v1,vj} ⊆ U ⊆ V.

Note: There are θ(N2N) states [U,j].Values: X[U,j] is the length of the shortest Hamiltonian path, starting with v1 and ending with vj, in the subgraph of G induced by U.Note: The optimal tour length equals

min {X[V,j] + d(vj,v1): 2 ≤ j ≤ N}.

Page 46: The Traveling Salesman Problem in Theory & Practice

Computing the Values X[U,j]X[{v1,vj},j] = d(v1,vj) , 2 ≤ j ≤ N.

Now assume we already have computed X[U,j], 2 ≤ j ≤ N, for all U, {v1,vj} ⊆ U ⊆ V, with |U| = k.

Let W be such that v1 ∈ W ⊆ V and |W| = k+1. Suppose vi, i > 1, is in W. Then

X[W,i] = min {X[W - {vi},j] + d(vj,vi): vj ∈ W - {vi}}

Computation takes O(N) time for each state [W,i]. Since there are θ(N2N) states overall, this yields an overall running time of O(N22N).

Page 47: The Traveling Salesman Problem in Theory & Practice
Page 48: The Traveling Salesman Problem in Theory & Practice
Page 49: The Traveling Salesman Problem in Theory & Practice

N = 85,900

Current World Record (2006)

Using a parallelized version of the Concorde code, Helsgaun’s sophisticated variant on Iterated Lin-Kernighan, and 2719.5 cpu-days

Page 50: The Traveling Salesman Problem in Theory & Practice

Concorde • “Branch-and-Cut” approach exploiting linear

programming to determine lower bounds on optimal tour length.

• Based on 30+ years of theoretical developments in the “Mathematical Programming” community, plus some very good data structures and heuristics work from computer science.

• For surprisingly large instances, it finds an optimal tour and proves its optimality (unless it runs out of time/space).

• Executables and source code can be downloaded from http://www.tsp.gatech.edu/

Page 51: The Traveling Salesman Problem in Theory & Practice

Running times (in seconds) for 10,000 Concorde runs on random 1000-city planar Euclidean instances (2.66 Ghz Intel Xeon processor in dual-processor PC, purchased late 2002).

Range: 7.1 seconds to 38.3 hours

Page 52: The Traveling Salesman Problem in Theory & Practice

Concorde Asymptotics[Hoos and Stϋtzle, 2009 draft]

• Estimated median running time for random Euclidean instances.

• Based on– 1000 samples each for N = 500,600,…,2000– 100 samples each for N = 2500, 3000,3500,4000,4500– 2.4 Ghz AMD Opteron 2216 processors with 1MB L2

cache and 4 GB main memory, running Cluster Rocks Linux v4.2.1.

0.21 · 1.24194 √N

Actual median for N = 2000: ~57 minutes, for N = 4,500: ~96 hours

Page 53: The Traveling Salesman Problem in Theory & Practice

For Larger Instances: Fast Heuristics• Tour construction heuristics like Nearest Neighbor,

Greedy, Christofides.

• Local search heuristics like 2-Opt, 3-Opt, Lin-Kernighan, Iterated Lin-Kernighan, or Helsgaun’s Algorithm.

• A range of heurstics may be useful, based on tradeoffs between tour quality and running time.

Page 54: The Traveling Salesman Problem in Theory & Practice

Necessary Digression: Metrics

• As the TSP is defined, the city-city distances (edge lengths) are only constrained to satisfy

1. d(c,c’) ≥ 0, for all pairs of cities c,c’ (non-negativity)

2. d(c,c’) = 0 if and only if c = c’

• To be a quasimetric, the distances also must satisfy the “triangle inequality”

3. d(c,c’) ≤ d(c,c’’) + d(c’’,c’) for all triples of cities

• To be a metric, the distances must also be symmetric:

4. d(c,c’) = c(c’,c), for all pairs of cities c,c’

Page 55: The Traveling Salesman Problem in Theory & Practice

Shortest Path “Metric”• Let d be a TSP distance function. For any pair c,c’ of

cities, let dS(c,c’) be the length of shortest path from c to c’ under d.

• Note that dS will be a quasimetric (and a metric if d is symmetric)

• For most real-world applications, dS is actually the distance function of interest, and so the triangle inequality holds.

• As we shall see shortly, if we have the triangle inequality, we can obtain good performance guarantees for certain heuristics.

Page 56: The Traveling Salesman Problem in Theory & Practice

Additional Restriction in Practice• Distances are integers.

– Simplifies codes.– Yields a definitive optimal solution value.– Not a real restriction if distances are rational.– Allows us to cope with the problemmatic

Euclidean metric.

Page 57: The Traveling Salesman Problem in Theory & Practice

Euclidean Difficulties• The length of a TSP tour for points in the plane under

the Euclidean metric is a sum of square roots:Length = ∑i(xi)1/2

• Given such an expression and a constant B our current best algorithm for determining whether the length is less than B takes exponential time.

• Hence, we do not even know whether the decision problem version of the Euclidean TSP is in NP.

• And if we round the distances to some fixed precision, then we may get different optimal tours for different precisions (up to an exponential number of bits).

Page 58: The Traveling Salesman Problem in Theory & Practice

Rounding Conventions1. Round Nearest dn(x) = floor(x+.5)

– Likely to be yield tour lengths closest to the true Euclidean– Although optimal tours may opportunistically favor the

rounded-down edge lengths– And triangle inequality may no longer be obeyed

dn(x,z) = 3 > dn(x,y) + dn (y,z) = 1 + 1 = 2.

1.3 1.3x z

y

Page 59: The Traveling Salesman Problem in Theory & Practice

Rounding Conventions2. Round Down df(x) = floor(x)

– Possibly most efficiently computable.– But underestimates true tour length. – Also fails to obey triangle inequality. floor(3.8) > floor(1.9) + floor(1.9)

3. Round Up dc(x) = ceiling(x) – Does obey the triangle inequality.– But overestimates true tour length.

Page 60: The Traveling Salesman Problem in Theory & Practice

Exploiting Triangle Inequality

• Observation 1: Any connected graph in which every vertex has even degree contains an “Euler Tour” – a cycle that traverses each edge exactly once, which can be found in linear time.

• Observation 2: If the Δ-inequality holds, then traversing an Euler tour but skipping past previously-visited vertices yields a Traveling Salesman tour of no greater length.

Page 61: The Traveling Salesman Problem in Theory & Practice
Page 62: The Traveling Salesman Problem in Theory & Practice
Page 63: The Traveling Salesman Problem in Theory & Practice

Obtaining the Initial Graph• Double MST algorithm (DMST):

– Combine two copies of a Minimum Spanning Tree.– Theorem [Folklore]: DMST(I) ≤ 2Opt(I).

• Christofides algorithm (CH):– Combine one copy of an MST with a minimum-length

matching on its odd-degree vertices (there must be an even number of them since the total sum of degrees for any graph is even).

– Theorem [Christofides, 1976]: CH(I) ≤ 1.5Opt(I).

Page 64: The Traveling Salesman Problem in Theory & Practice

Optimal Tour on Odd-Degree Vertices(No longer than overall Optimal Tour by the

triangle inequality)

Matching M1 Matching M2+ = Optimal Tour

Hence Optimal Matching ≤ min(M1,M2) ≤ OPT(I)/2

Page 65: The Traveling Salesman Problem in Theory & Practice

2-Opt

3-Opt

Smart-Shortcut Christofides

Page 66: The Traveling Salesman Problem in Theory & Practice
Page 67: The Traveling Salesman Problem in Theory & Practice

1 million cities on my 3.06 Ghz iMac: Lin-Kernighan gets within 2% of optimal in 61 seconds.

The “strip” heuristic gets within 30% in 2 seconds.

Compared to 40% for the much slower “double MST” heuristic.

Page 68: The Traveling Salesman Problem in Theory & Practice

The Held-Karp Bound and the Optimal Solution Value

Page 69: The Traveling Salesman Problem in Theory & Practice

Integer Programming Formulation for Symmetric TSP

• Minimize ∑dixi

where di is the length of edge ei

• Subject to xi ∈ {0,1}, for all edges ei ∈ C X C

∑c∈eixi = 2, for all cities c ∈ C,

∑|ei∈U|=1 xi ≥ 2, for all proper subsets U ⊂ C

Page 70: The Traveling Salesman Problem in Theory & Practice

Linear Programming Relaxation: “Held-Karp” or “Subtour” Bound

• Minimize ∑dixi

where di is the length of edge ei

• Subject to xi ∈ [0,1], for all edges ei ∈ C X C

∑c∈eixi = 2, for all cities c ∈ C,

∑|ei∈U|=1 xi ≥ 2, for all proper subsets U ⊂ C

Page 71: The Traveling Salesman Problem in Theory & Practice

Percent by which Optimal Tour exceeds Held-Karp Bound

For “Uniform Points” in the Unit Square (+), the gap appears to decline to a value of about 0.44% asymptotically.

Page 72: The Traveling Salesman Problem in Theory & Practice

Computing the HK Bound• Major obstacle: exponential number of cut constraints.

∑|ei∈U|=1 xi ≥ 2, for all proper subsets U ⊂ C.

• However, one can find violated constraints in polynomial time by maximum flow techniques (and other heuristics).

• Concorde has options for computing the bound in roughly this way (5 hours on my iMac for a million cities).

• One can also construct an alternative LP formulation that is of polynomial size, so the HK bound can in principle be computed in polynomial time.

Page 73: The Traveling Salesman Problem in Theory & Practice

Topics to Be Covered• NP-completeness proofs, hardness of approximation results.• Polynomial-time (and 2o(n)-time) solvable special cases.• Branch-and-cut optimization algorithms (Concorde, etc.): theory and

engineering.• Properties of optimal solutions.• Polynomial-time approximation tour construction heuristics with good worst-

case guarantees and/or average case performance.• Data structures, exploiting geometry, and other speed-up tricks for heuristics.• Local Optimization heuristics (2-Opt, 3-Opt, Lin-Kernighan).• Metaheuristics (neural nets, simulated annealing, genetic algorithms, etc.).• Variants (max TSP, min-latency TSP, prize-collecting TSP, Vehicle routing, …)