Traveling Salesman Problems Motivated by Robot Navigation Maria Minkoff MIT With Avrim Blum, Shuchi...

Preview:

Citation preview

Traveling Salesman Problems Motivated by

Robot Navigation

Maria Minkoff

MIT

With Avrim Blum, Shuchi Chawla, David Karger, Terran Lane,

Adam Meyerson

A Robot Navigation Problem

• Robot delivering packages in a building• Goal to deliver as quickly as possible• Classic model: Traveling Salesman Problem

• Find a tour of minimum length

• Additional constraints:• some packages have higher priority• uncertainty in robot’s behavior

• battery failure• sensor error, motor control error

Markov Decision Process Model

• State space S

• Choice of actions aA at each state s

• Transition function T(s’|s,a)

• action determines probability distribution on next state

• sequence of actions produces a random path through graph

• Rewards R(s) on states

• If arrive in state s at time t, receive discounted reward tR(s) for

• MDP Goal: policy for picking an action from any state that maximizes total discounted reward

Exponential Discounting

• Motivates to get to desired state quickly

• Inflation: reward collected in distant future decreases in value due to uncertainty • at time t robot loses power with fixed probability

• probability of being alive at t is exponentially distributed

• discounting reflects value of reward in expectation

Solving MDP

• Fixing action at each state produces a Markov Chain with transition probabilities pvw

• Can compute expected discounted reward v if start at state v:

v = rv + w pvw t(v,w) w

• Choosing actions to optimize this recurrence is polynomial time solvable • Linear programming

• Dynamic programming (like shortest paths)

Solving the wrong problem

• Package can only be delivered once• So should not get reward each time reach target

• One solution: expand state space• New state = current location past locations

(packages already delivered)

• Reward nonzero only on states where current location not included in list of previously visited

• Now apply MDP algorithm

• Problem: new state space has exponential size

Tackle an easier problem

• Problem has two novel elements for “theory”• Discounting of reward based on arrival time

• Probability distribution on outcome of actions

• We will set aside second issue for now• In practice, robot can control errors

• Even first issue by itself is hard and interesting

• First step towards solving whole problem

Discounted-Reward TSP

Given • undirected graph G=(V,E) • edge weights (travel times) de ≥ 0

• weights on nodes (rewards) rv ≥ 0

• discount factor (0,1)• root node s

Goalfind a path P starting at s that maximizes

total discounted reward (P) = v P rv dP(v)

Approximation Algorithms

• Discounted-Reward TSP is NP-complete (and so is more general MDP-type problem)

• reduction from minimum latency TSP

• So intractable to solve exactly

• Goal: approximation algorithm that is guaranteed to collect at least some constant

fraction of the best possible discounted reward

Related Problems

Goal of Discounted-Reward TSP seems to be to find a “short” path that collects “lots” of reward

• Prize-Collecting TSP• Given a root vertex v, find a tour containing v that

minimizes total length + foregone reward (undiscounted)

• Primal-dual 2-approximation algorithm [GW 95]

k-TSP

• Find a tour of minimum length that visits at least k vertices

• 2-approximation algorithm known for undirected graphs based on algorithm for PC-TSP [Garg 99]

• Can be extended to handle node-weighted version

Mismatch

Constant factor approximation on length doesn’t exponentiate well

• Suppose optimum solution reaches some vertex v at time t for reward tr

• Constant factor approximation would reach within time 2t for reward 2tr

• Result: get only t fraction of optimum discounted reward, not a constant fraction.

Orienteering Problem

Find a path of length at most D that maximizes net reward collected

• Complement of k-TSP • approximates reward collected instead of length• avoids changing length, so exponentiation doesn’t hurt• unrooted case can be solved via k-TSP

• Drawback: no constant factor approximation for rooted non-geometric version previously known

• Our techniques also give a constant factor approximation for Orienteering problem

Our Results

Using -approximation for k-TSP as subroutine

• (3/2 +2)-approximation for Orienteering

• e(3/2 + 2)-approximation for Discounted-Reward Collection

• constant-factor approximations for tree- and multiple-path versions of the problems

Our Results

Using -approximation for k-TSP as subroutinesubstitute =2 announced by Garg in 1999

• (3/2 +25 -approximation for Orienteering

• e(3/2 +13-approximation for Discounted-Reward Collection

• constant-factor approximations for tree- and multiple-path versions of the problems

Eliminating Exponentiation

• Let dv = shortest path distance (time) to v

• Define the prize at v as v=dv rv

• max discounted reward possibly collectable at v

• If given path reaches v at time tv,

define excess ev = tv – dv

• difference between shortest path and chosen one

• Then discounted reward at v is ev v

• Idea: if excess small, prize ~ discounted reward

• Fact: excess only increases as traverse path

• excess reflects lost time; can’t make it up

Optimum path• assume = ½ (can scale edge lengths)

Claim: at least ½ of optimum path’s discounted reward R is collected

before path’s excess reaches 1

s

u

Proof by contradiction:• Let u be first vertex with eu ≥ 1• Suppose more than R/2 reward follows u• Can shortcut directly to u then traverse the rest of optimum

• reduces all excesses after u by at least 1• so “undiscounts” rewards by factor -1 = 2• so doubles discounted reward collected• but this was more than R/2: contradiction

0

1

0.5

1.5

2

3

0

0.5

1

2

New problem: Approximate Min-Excess Path

• Suppose there exists an s-t path P* with prize value of length l(P*)=dt+e

• Optimization: find s-t path P with prize value ≥ that minimizes excess l(P)-dt over shortest path to t

• equivalent to minimizing total length, e.g. k-TSP

• Approximation: find s-t path P with prize value ≥ that approximates optimum excess over shortest path to t, i.e. has length l(P) = dt + ce

• better than approximating entire path length

Using Min-Excess Path

• Recall discounted reward at v is ev v

• Prefix of optimum discounted reward path:

• collects discounted reward ev v R/2

spans prize v R/2

• and has no vertex with excess over 1

• Guess t = last node on opt path with excess et 1

• Find a path to t of approximately (4 times) minimum excess that spans R/2 prize (we can guess R/2)

• Excesses at most 4, so ev v v/16

discounted reward on found path R/32

Solving Min-Excess Path problem

Exactly solvable case: monotonic paths

• Suppose optimum path goes through vertices in strictly increasing distance from root

• Then can find optimum by dynamic program• Just as can solve longest path in an acyclic graph

• Build table• For each vertex v: is there a monotonic path from

v with length l and prize ?

Solving Min-Excess Path problem

Approximable case: wiggly paths

• Length of path to v is lv = dv + ev

• If ev > dv then lv > ev > lv/2

• i.e., take twice as long as necessary to reach v

• So if approximate lv to constant factor, also approximate ev to twice that constant factor

Approximating path length

• Can use k-TSP algorithm to find approximately shortest s-t path with specified prize

• merge s and t into vertex r • opt path becomes a tour• solve k-TSP with root r

• “unmerge”: can get one or more cycles

r

s t

• connect s and t by shortest path

Decompose optimum path

monotone monotone monotonewiggly wiggly

> 2/3 of each wiggly path is excess

Divides into independent problems

Decomposition Analysis

• 2/3 of each wiggly segment is excess

• That excess accumulates into whole path

• total excess of wiggly segment excess of whole path

total length of wiggly segments 3/2 of path excess

• Use dynamic program to find shortest (min-excess) monotonic segments collecting target prize

• Use k-TSP to find approximately shortest wiggles collecting target prize

• Approximates length, so approximates excess • Over all monotonic and wiggly segments,

approximates total excess

Dynamic program for Min-Excess Path

• For each pair of vertices and each (discretized) prize value, find• Shortest monotonic path collecting desired prize

• Approximately shortest wiggly path collecting desired prize

• Note: polynomially many subproblems

• Use dynamic programming to find optimum pasting together of segments

Solving Orienteering Problem: special case

• Given a path from s that• collects prize • has length D

• ends at t, the farthest point from s

v

t

s

• For any const integer r 1, there

exists a path from s to some v with• prize /r

• excess (D-dv)/r

0

0.5

1

1.5

2

3

1

Solving Orienteering Problem

General case: path ends at arbitrary t• Let u be the farthest point from s • Connect t to s via shortest path• One of path segments ending at u

• has prize /2• has length D

Reduced to special case• Using 4-approximation for

Min-Excess Path get 8-approximation for Orienteering

s

t

u

Budget Prize-Collecting Steiner Tree problem

Find a rooted tree of edge cost at most D that spans maximum amount of prize

• Complement of k-MST

• Create Euler tour of opt tree T* of cost 2D

• Divide this tour into two paths starting at root each of length D

• One of them contains at least ½ of total prize

• Path is a type of tree

• Use c-approximation algorithm for Orienteering to obtain 2c-approximation for Budget PCST

Summary

• Showed maximum discounted reward can be approximated using min-excess path

• Showed how to approximate min-excess pathusing k-TSP

• Min-excess path can also be used to solve rooted Orienteering problem (open question)• Also solves “tree” and “cycle” versions of

Orienteering

Open Questions

• Non-uniform discount factors• each vertex v has its own v

• Non-uniform deadlines• each vertex specifies its own deadline by which it

has to be visited in order to collect reward

• Directed graphs• We used k-TSP, only solved for undirected

• For directed, even standard TSP has no known constant factor approximation

• We only use k-TSP/undirectedness in wiggly parts

Future directions

• Stochastic actions• Stochastic seems to imply directed

• Special case: forget rewards. • Given choice of actions, choose to minimize cover

time of graph

• Applying discounting framework to other problems :• Scheduling

• Exponential penalty in place of hard deadlines

Recommended