Approximation Algorithms for Stochastic Orienteering
Anupam Gupta Ravishankar Krishnaswamy Viswanath Nagarajan R. Ravi
In the Stochastic Orienteering problem, we are given ametric, where each node also has a job located therewith some deterministic reward and a random size.(Think of the jobs as being chores one needs to run,and the sizes as the amount of time it takes to do thechore.) The goal is to adaptively decide which nodesto visit to maximize total expected reward, subject tothe constraint that the total distance traveled plus thetotal size of jobs processed is at most a given budgetof B. (I.e., we get reward for all those chores we finishby the end of the day). The (random) size of a jobis not known until it is completely processed. Hencethe problem combines aspects of both the stochasticknapsack problem with uncertain item sizes and thedeterministic orienteering problem of using a limitedtravel time to maximize gathered rewards located atnodes.
In this paper, we present a constant-factor ap-proximation algorithm for the best non-adaptive pol-icy for the Stochastic Orienteering problem. Wealso show a small adaptivity gapi.e., the exis-tence of a non-adaptive policy whose reward is atleast an (1/ log log B) fraction of the optimal ex-pected rewardand hence we also get an O(log log B)-approximation algorithm for the adaptive problem. Fi-nally we address the case when the node rewards arealso random and could be correlated with the wait-ing time, and give a non-adaptive policy which is anO(log n log B)-approximation to the best adaptive pol-icy on n-node metrics with budget B.
Department of Computer Science, Carnegie Mellon Univer-sity, Pittsburgh PA 15213. Research was partly supported byNSF awards CCF-0964474 and CCF-1016799.
Department of Computer Science, Carnegie Mellon Univer-sity, Pittsburgh PA 15213. Research was partly supported byNSF awards CCF-0964474 and CCF-1016799, and an IBM Grad-uate Fellowship.
IBM T.J. Watson Research Center, Yorktown Heights, NY10598, USA.
Tepper School of Business, Carnegie Mellon University, Pitts-burgh PA 15213. Research was partly supported by NSF awardsCCF-1143998.
Consider the following problem: you start your day athome with a set of chores to run at various locations(e.g., at the bank, the post office, the grocery store),but you only have limited time to run those chores in(say, you have from 9am until 5pm, when all these shopsclose). Each successfully completed chore/job j givesyou some fixed reward rj . You know the time it takesyou to travel between the various job locations: thesedistances are deterministic and form a metric (V, d).However, you do not know the amount of time you willspend doing each job (e.g., standing in the queue, fillingout forms). Instead, for each job j, you are only giventhe probability distribution j governing the randomamount of time you need to spend performing j. Thatis, once you start performing the job j, the job finishesafter Sj time units and you get the reward, where Sjis a random variable denoting the size, and distributedaccording to j .1 (You may cancel processing the job jprematurely, but in that case you dont get any reward,and you are barred from trying j again.) The goal is nowa natural one: given the metric (V, d), the starting point, the time budget B, and the probability distributionsfor all the jobs, give a strategy for traveling aroundand doing the jobs that maximizes the expected rewardaccrued.
The case when all the sizes are deterministic (i.e.,Sj = sj with probability 1) is the orienteering problem,for which we now know a (2 + )-approximation algo-rithm [5, 7]. Another special case, where all the choresare located at the start node, but the sizes are random,is the stochastic knapsack problem, which also admitsa (2+ )-approximation algorithm [12, 3]. However, thestochastic orienteering problem above, which combinesaspects of both these problems, seems to have been hith-erto unexplored in the approximation algorithms liter-ature.
It is known that even for stochastic knapsack, an op-timal adaptive strategy may be exponential-sized (and
1To clarify: before you reach the job, all you know about itssize is what can be gleaned from the distribution j of Sj ; andeven having worked on j for t units of time, all you know aboutthe actual size of j is what you can infer from the conditional(Sj | Sj > t).
finding the best strategy is PSPACE-hard) . So an-other set of interesting questions is to bound the adap-tivity gap, and to find non-adaptive solutions whoseexpected reward is close to that of the best adaptivesolution. A non-adaptive solution for stochastic orien-teering is simply a tour P of points in the metric spacestarting at the root : we visit the points in this fixedorder, performing the jobs at the points we reach, untiltime runs out.
One natural algorithm for stochastic orienteeringis to replace each random job j by a deterministicjob of size E[Sj ], and use an orienteering algorithm tofind the tour of maximum reward in this deterministicinstance.2 Such an approach was taken by  forstochastic knapsack: they showed a constant adaptivitygap, and constant-factor approximation via preciselythis idea. However, for stochastic orienteering this givesonly an O(log B) approximation, and indeed there areexamples where we get only an (log B) fraction of theoptimal expected reward. (See Section 4 for a moredetailed discussion.) In this paper we show we can domuch better than this logarithmic approximation:
Theorem 1.1. There is an O(log log B)-approximationalgorithm for the stochastic orienteering problem.
Indeed, our proof proceeds by first showing the followingstructure theorem which bounds the adaptivity gap:
Theorem 1.2. Given an instance of the stochastic ori-enteering problem, then
either there exists a single job which gives an(log log B) fraction of the optimal reward, or there exists a value W such that the optimal non-adaptive tour which spends at most W time wait-ing and BW time traveling, gets an (log log B)fraction of the optimal reward.
Note that navely we would expect only a logarithmicfraction of the reward, but the structure theorem showswe can do better. Indeed, this theorem is the technicalheart of the paper, and is proved via a martingaleargument of independent interest. Since the abovetheorem shows the existence of a non-adaptive solutionclose to the best adaptive solution, we can combine itwith the following result to prove Theorem 1.1.
Theorem 1.3. There exists a constant-factor approxi-mation algorithm to the optimal non-adaptive policy forstochastic orienteering.
2Actually, one should really replace the stochastic job witha deterministic one of size E[min(Sj , B d(, j))] and rewardrj Pr[Sj + d(, j) B], it is very easy to fool the algorithmotherwise.
Note that if we could show an existential proof ofa constant adaptivity gap (which we conjecture tobe true), the above approximation for non-adaptiveproblems that we show immediately implies an O(1)-approximation algorithm for the adaptive problem too.
Our second set of results are for a variant of theproblem, one where both the rewards and the job sizesare random and correlated with each other. For thiscorrelated problem, we show the following results:
Theorem 1.4. There is a polynomial-time algorithmthat outputs a non-adaptive solution for correlatedstochastic orienteering, which is an O(log n log B)-approximation to the best adaptive solution. More-over, the correlated problem is at least as hard as theorienteering-with-deadlines problem.
Recall that we only know an O(log n) approximation forthe orienteering-with-deadlines problem .
Our Techniques: Most of the previous adaptivitygaps have been shown by considering some linear pro-gramming relaxation that captures the optimal adaptivestrategies, and then showing how to round a fractionalLP solution to get a non-adaptive strategy. But sincewe do not know a good relaxation for even the deter-ministic orienteering problem, this approach seems dif-ficult to take directly. So to show our adaptivity gapresults, we are forced to argue directly about the opti-mal adaptive strategy: we use martingale arguments toshow the existence of a path (a.k.a. non-adaptive strat-egy) within this tree which gives a large reward. Wecan then use algorithms for the non-adaptive settings,which are based on reductions to orienteering (with anadditional knapsack constraint).
Roadmap: The rest of the paper follows the out-line above. We begin with some definitions in Section 2,and then define and give an algorithm for the knapsackorienteering problem which will be a crucial sub-routinein all our algorithms in Section 3. Next we motivate ournon-adaptive algorithm by discussing a few straw-mansolutions and the traps they fall into, in Section 4. Wethen state and prove our constant-factor non-adaptivealgorithm in Section 5, which naturally leads us to ourmain result in Section 6, the proof of the O(log log B)-adaptivity gap for StocOrient. Finally, we present inSection 7 the poly-logarithmic approximation for theproblem where the rewards and sizes are correlated witheach other. For all these results we assume that we arenot allowed to cancel jobs once we begin working onthem: in Section 8 we show how to discharge this as-sumption.
1.1 Related Work The orienteering problem isknown to be APX-hard, and the first constant-factor
approximation was due to Blum et al. . Their fac-tor of 4 was improved by  and ultimately by  to(2 + ) for every > 0. (The paper  also consid-ers the problem with deadlines and time-windows; seealso [8, 9].) There is a PTAS for low-dimensional Eu-clidean space . To the best of our knowledge, thestochastic version of the orienteering problem has notbeen studied before from the perspective of approxima-tion algorithms. Heuristics and empirical guarantees fora similar problem were given by Campbell et al. .
The stochastic knapsack problem  is a specialcase of this problem, where all the tasks are locatedat the root itself. Constant factor approximationalgorithms for the basic problem were given by [12,4], and extensions where the rewards and sizes arecorrelated were studied in . Most of these papersproceed via writing LP relaxations that capture theoptimal adaptive policy; extending these to our problemfaces the barrier that the orienteering problem is notknown to have a good natural LP relaxation.
Another very related body of work is on budgetedlearning. Specifically, in the work of Guha and Muna-gala , there is a collection of Markov chains spreadaround in a metric, each state of each chain having anassociated reward. When the player is at a Markovchain at j, she can advance the chain one step everyunit of time. If she spends at most L time units trav-eling, and at most C time units advancing the Markovchains, how can she maximize some function (say thesum or the max) of rewards of the final states in expec-tation?  give an elegant constant factor approxima-tion to this problem (under some mild conditions on therewards) via a reduction to classical orienteering usingLagrangean multipliers. Our algorithm/analysis for theknapsack orienteering problem (defined in Section 2) isinspired by theirs; the analysis of our algorithm thoughis simpler, due to the problem itself being determin-istic. However, it is unclear how to directly use suchtwo-budget approaches to get O(1)-factors for our one-budget problem without incurring an O(log B)-factor inthe approximation ratio.
Finally, while most of the work on giving ap-proximations to adaptive problems has proceeded byusing LP relaxations to capture the optimal adap-tive strategiesand then often rounding them to getnon-adaptive strategies, thereby also proving adaptiv-ity gaps [14, 12], there are some exceptions. In par-ticular, papers on adaptive stochastic matchings ,on stochastic knapsack [4, 3], on building decisiontrees [18, 1, 17], all have had to reason about the optimaladaptive policies directly. We hope that our martingale-based analysis will add to the set of tools used for sucharguments.
2 Definitions and Notation
An instance of stochastic orienteering is defined on anunderlying metric space (V, d) with ground set |V | = nand symmetric integer distances d : V V Z+(satisfying the triangle inequality) that represent traveltimes. Each vertex v V is associated with a uniquestochastic job, which we also call v. For the firstpart of the paper, each job v has a fixed rewardrv Z0, and a random processing time/size Sv,which is distributed according to a known but arbitraryprobability distribution v : R+ [0, 1]. (In theintroduction, this was the amount we had to wait inqueue before receiving the reward for job v.) We arealso given a starting root vertex , and a budget Bon the total time available. Without loss of generality,we assume that all distances are integer values.
The only actions allowed to an algorithm are totravel to a vertex v and begin processing the job there:when the job finishes after its random length Sv oftime, we get the reward rv (so long as the total timeelapsed, i.e., total travel time plus processing time, isat most B), and we can then move to the next job.There are two variants of the basic problem: in the basicvariant, we are not allowed to cancel any job that webegin processing (i.e., we cannot leave the queue oncewe join it). In the version with cancellations, we cancancel any job at any time without receiving any reward,and we are not allowed to attempt this job again in thefuture. Our results for both versions will have similarapproximation guaranteesfor most of the paper, wefocus on the basic version, and in Section 8 we show howto reduce the more general version with cancellations tothe basic one.
Note that any strategy consists of a decision treewhere each state depends on which previous jobs wereprocessed, and what information we got about theirsizes. Now the goal is to devise a strategy which, start-ing at the root , must decide (possibly adaptively)which jobs to travel to and process, so as to maxi-mize the expected sum of rewards of jobs successfullycompleted before the total time (travel and processing)reaches the threshold of B.
In the second part of the paper, we consider thesetting of correlated rewards and sizes: in this model,the job sizes and rewards are both random, and arecorrelated with each other. (Recall that the stochasticknapsack version of this problem also admits a constantfactor approximation ).
We are interested in both adaptive and non-adaptive strategies, and in particular, want to bound theratio between the performance of the best adaptive andbest non-adaptive strategies. An adaptive strategy is adecision tree where each node is labeled by a job/vertex
of V , with the outgoing arcs from a node labeled by jcorresponding to the possible sizes in the support of j .A non-adaptive strategy, on the other hand, is a path Pstarting at ; we...