Dynamic Programming Approximations for a Stochastic ...ms79/publications/ts38-1.pdf · Dynamic Programming Approximations for a Stochastic Inventory Routing Problem Anton J. Kleywegt

Dynamic Programming Approximations for a Stochastic Inventory

Routing Problem

Anton J. Kleywegt ∗

Vijay S. NoriMartin W. P. Savelsbergh

School of Industrial and Systems EngineeringGeorgia Institute of Technology

Atlanta, GA 30332-0205

August 28, 2002

Abstract

This work is motivated by the need to solve the inventory routing problem when implementing a

business practice called vendor managed inventory replenishment (VMI). With VMI, vendors monitor

their customers’ inventories, and decide when and how much inventory should be replenished at each

customer. The inventory routing problem attempts to coordinate inventory replenishment and trans-

portation in such a way that the cost is minimized over the long run. We formulate a Markov decision

process model of the stochastic inventory routing problem, and propose approximation methods to find

good solutions with reasonable computational effort. We indicate how the proposed approach can be

used for other Markov decision processes involving the control of multiple resources.

∗Supported by the National Science Foundation under grant DMI-9875400.

Introduction

Recently the business practice called vendor managed inventory replenishment (VMI) has been adopted by

many companies. VMI refers to the situation in which a vendor monitors the inventory levels at its customers

and decides when and how much inventory to replenish at each customer. This contrasts with conventional

inventory management, in which customers monitor their own inventory levels and place orders when they

think that it is the appropriate time to reorder. VMI has several advantages over conventional inventory

management. Vendors can usually obtain a more uniform utilization of production resources, which leads

to reduced production and inventory holding costs. Similarly, vendors can often obtain a more uniform

utilization of transportation resources, which in turn leads to reduced transportation costs. Furthermore,

additional savings in transportation costs may be obtained by increasing the use of low-cost full-truckload

shipments and decreasing the use of high-cost less-than-truckload shipments, and by using more efficient

routes by coordinating the replenishment at customers close to each other.

VMI also has advantages for customers. Service levels may increase, measured in terms of reliability of

product availability, due to the fact that vendors can use the information that they collect on the inventory

levels at the customers to better anticipate future demand, and to proactively smooth peaks in the demand.

Also, customers do not have to devote as many resources to monitoring their inventory levels and placing

orders, as long as the vendor is successful in earning and maintaining the trust of the customers.

A first requirement for a successful implementation of VMI is that a vendor is able to obtain relevant

and accurate information in a timely and efficient way. One of the reasons for the increased popularity

of VMI is the increase in the availability of affordable and reliable equipment to collect and transmit the

necessary data between the customers and the vendor. However, access to the relevant information is only

one requirement. A vendor should also be able to use the increased amount of information to make good

decisions. This is not an easy task. In fact, it is a very complicated task, as the decision problems involved

are very hard. The objective of this work is to develop efficient methods to help the vendor to make good

decisions when implementing VMI.

In many applications of VMI, the vendor manages a fleet of vehicles to transport the product to the

customers. The objective of the vendor is to coordinate the inventory replenishment and transportation

in such a way that the total cost is minimized over the long run. The problem of optimal coordination of

inventory replenishment and transportation is called the inventory routing problem (IRP).

In this paper, we study the problem of determining optimal policies for the variant of the IRP in which a

single product is distributed from a single vendor to multiple customers. The demands at the customers are

assumed to have probability distributions that are known to the vendor. The objective is to maximize the

expected discounted value, incorporating sales revenues, production costs, transportation costs, inventory

holding costs, and shortage penalties, over an infinite horizon.

2

Our work on this problem was motivated by our collaboration with a producer and distributor of air

products. The company operates plants worldwide and produces a variety of air products, such as liquid

nitrogen, oxygen and argon. The company’s bulk customers have their own storage tanks at their sites, which

are replenished by tanker trucks under the supplier’s control. Approximately 80% of the bulk customers

participate in the company’s VMI program. For the most part each customer and each vehicle is allocated

to a specific plant, so that the overall problem decomposes according to individual plants. Also, to improve

safety and reduce contamination, each vehicle and each storage tank at a customer is dedicated to a particular

type of product. Hence the problem also decomposes according to type of product. (This assumption does

not hold if the number of drivers is a tight constraint, and drivers can be allocated to deliver one of several

different products.) Therefore, in this paper we consider an inventory routing problem with a single vendor,

multiple customers, multiple vehicles, and a single type of product.

The main contributions of the research reported in this paper are as follows:

1. In an earlier paper (Kleywegt et al., 2002), we formulated the inventory routing problem with direct

deliveries, i.e., one delivery per trip, as a Markov decision process and proposed an approximate

dynamic programming approach for its solution. In this paper, we extend both the formulation and

the approach to handle multiple deliveries per trip.

2. We present a solution approach that uses decomposition and optimization to approximate the value

function. Specifically, the overall problem is decomposed into smaller subproblems, each designed to

have two properties: (1) it provides an accurate representation of a portion of the overall problem, and

(2) it is relatively easy to solve. In addition, an optimization problem is defined to combine the solutions

of the subproblems, in such a way that the value of a given state of the process is approximated by the

optimal value of the optimization problem.

3. Computational experiments demonstrate that our approach allows the construction of near optimal

policies for small instances and policies that are better than policies that have been proposed in the

literature for realistically sized instances (with approximately 20 customers). The sizes of the state

spaces for these instances are orders of magnitude larger than those that can be handled with more

traditional methods, such as the modified policy iteration algorithm.

In Section 1 we define the stochastic inventory routing problem, point out the obstacles encountered

when attempting to solve the problem, present an overview of the proposed solution method, and review

related literature. In Section 2 we propose a method for approximating the dynamic programming value

function. In Section 3 the day-to-day control of the IRP process using the dynamic programming value

function approximation is discussed. In Section 4 we investigate a special case of the IRP. Computational

3

results are presented in Section 5, and Section 6 concludes with some remarks regarding the application of

the approach to other stochastic control problems.

1 Problem Definition

A general description of the IRP is given in Section 1.1, after which a Markov decision process formulation is

given in Section 1.2. Section 1.3 discusses the issues to be addressed when solving the IRP, and Section 1.4

presents an overview of the proposed solution method. Section 1.5 reviews some related literature.

1.1 Problem Description

A product is distributed from a vendor’s facility to N customers, using a fleet of M homogeneous vehicles,

each with known capacity C. The process is modeled in discrete time t = 0, 1, . . . , and the discrete time

periods are called days. Let random variable Uit denote the demand of customer i at time t, and let

Ut ≡ (U1t, . . . , UNt) denote the vector of customer demands at time t. Customers’ demands on different

days are independent random vectors with a joint probability distribution F that does not change with

time; that is, U0, U1, . . . is an independent and identically distributed sequence, and F is the probability

distribution of each Ut. The probability distribution F is known to the decision maker. (In many applications

customers’ demands on different days may not be independent; in such cases customers’ demands on previous

days may provide valuable data for the forecasting of customers’ future demands. A refined model with a

suitably expanded state space can be formulated to exploit such additional information. Such refinement is

not addressed in this paper.)

There is an upper bound Ci on the amount of product that can be in inventory at each customer i. This

upper bound Ci can be due to limited storage capacity at customer i, as in the application that motivated

this research. In other applications of VMI, there is often a contractual upper bound Ci, agreed upon by

customer i and the vendor, on the amount of inventory that may be at customer i at any point in time.

One motivation for this contractual bound is to prevent the vendor from dumping too much product at the

customer. The vendor can measure the inventory level Xit of each customer i at any time t.

At each time t, the vendor makes a decision that controls the routing of vehicles and the replenishment of

customers’ inventories. Such decisions may have many aspects, some of which are important for the method

developed in this paper, and others which are not. Aspects of daily decisions that are important for the

method developed in this paper are the following:

1. which customers’ inventories to replenish,

2. how much to deliver at each customer, and

4

3. how to combine customers into vehicle routes.

On the other hand, the ideas developed in the paper are independent of the routing constraints that are

imposed, and thus routing constraints are not explicitly spelled out in the formulation. Unless otherwise

stated, we assume that each vehicle can perform at most one route per day. We also assume that the duration

of the task assigned to each driver and vehicle is less than the length of a day, so that all M drivers and

vehicles are available at the beginning of each day, when the tasks for that day are assigned.

The expected value (revenues and costs) accumulated during a day depends on the inventory levels and

decision of that day, and is known to the vendor. As in the case of the routing constraints, the ideas developed

in the paper are independent of the exact composition of the costs of the daily decisions. Next we describe

some typical types of costs for illustrative purposes. (These costs were also used in numerical work.) The

cost of a daily decision may include the travel costs cij on the arcs (i, j) of the distribution network that

are traversed according to the decision. Travel costs may also depend on the amount of product transported

along each arc. The cost of a daily decision may include the costs incurred at customers’ sites, for example

due to product losses during delivery. The cost of a daily decision may include revenue: if quantity di is

delivered at customer i, the vendor earns a reward of ri(di). The cost of a daily decision may include shortage

penalties: because demand is uncertain, there is often a positive probability that a customer runs out of

stock, and thus shortages cannot always be prevented. Shortages are discouraged with a penalty pi(si) if the

unsatisfied demand on day t at customer i is si. Unsatisfied demand is treated as lost demand, and is not

backlogged. The cost of a daily decision may include inventory holding cost: if the inventory at customer i

is xi at the beginning of the day, and quantity di is delivered at customer i, then an inventory holding cost

of hi(xi + di) is incurred. The inventory holding cost can also be modeled as a function of some “average”

amount of inventory at each customer during the time period. The role played by inventory holding cost

depends on the application. In some cases, the vendor and customers belong to different organizations, and

the customers own the inventory. In these cases, the vendor typically does not incur any inventory holding

costs based on the inventory at the customers. This was the case in the application that motivated this work.

In other cases, such as when the vendor and customers belong to the same organization, or when the vendor

owns the inventory at the customers, the vendor does incur inventory holding costs based on the inventory

at the customers.

The objective is to choose a distribution policy that maximizes the expected discounted value (rewards

minus costs) over an infinite time horizon.

1.2 Problem Formulation

In this section we formulate the IRP as a discrete time Markov decision process (MDP) with the following

components:

5

1. The state x = (x1, x2, . . . , xN ) represents the current amount of inventory at each customer. Thus the

state space is X = [0, C1] × [0, C2] × · · · × [0, CN ] if the quantity of product can vary continuously, or

X = {0, 1, . . . , C1} × {0, 1, . . . , C2} × · · · × {0, 1, . . . , CN} if the quantity of product varies in discrete

units. Let Xit ∈ [0, Ci] (or Xit ∈ {0, 1, . . . , Ci}) denote the random inventory level at customer i at

time t. Let Xt = (X1t, . . . ,XNt) ∈ X denote the state at time t.

2. For any state x, let A(x) denote the set of all feasible decisions when the process is in state x. A

decision a ∈ A(x) made at time t when the process is in state x, contains information about (1) which

customers’ inventories to replenish, (2) how much to deliver at each customer, and (3) how to combine

customers into vehicle routes. A decision may contain more information such as travel times and

arrival and departure times at customers (relative to time windows); the three attributes of a decision

mentioned above are the important attributes for our purposes. For any decision a, let di(a) denote

the quantity of product that is delivered to customer i while executing decision a. The set A(x) is

determined by various constraints, such as work load constraints, routing constraints, vehicles’ capacity

constraints, and customers’ inventory constraints. As discussed in Section 1.1, constraints such as

work load constraints and routing constraints do not affect the method described in this paper. The

constraints explicitly addressed in this paper are the limited number M of vehicles that can be used

each day, the limited quantity C (vehicle capacity) that can be delivered by each vehicle on a day,

and the maximum inventory levels Ci that are allowed at any time at each customer i. The maximum

inventory level constraints can be imposed in a variety of ways. For example, if it is assumed that no

product is used between the time that the inventory level xi is measured at customer i and the time

that the delivery of di(a) takes place, then the maximum inventory level constraints can be expressed

as xi + di(a) ≤ Ci for all i, all x ∈ X , and all a ∈ A(x). If product is used during this time period,

it may be possible to deliver more. The exact way in which the constraint is applied does not affect

the rest of the development. For simplicity we applied the constraint as stated above. Let the random

variable At ∈ A(Xt) denote the decision chosen at time t.

3. In this formulation, the source of randomness is the random customer demands Uit. To simplify the

exposition, assume that the deliveries at time t take place in time to satisfy the demand at time t.

Then the amount of product used by customer i at time t is given by min{Xit + di(At), Uit}. Thus

the shortage at customer i at time t is given by Sit = max{Uit − (Xit + di(At)), 0}, and the next

inventory level at customer i at time t+1 is given by Xi,t+1 = max{Xit +di(At)−Uit, 0}. The known

joint probability distribution F of customer demands Ut gives a known Markov transition function Q,

according to which transitions occur. For any state x ∈ X , any decision a ∈ A(x), and any Borel subset

B ⊆ X , let U(x, a,B) ≡{

U ∈ �N+ :

(max{x1 +d1(a)−U1, 0}, . . . ,max{xN +dN (a)−UN , 0}) ∈ B

}.

6

Then Q[B | x, a] ≡ F [U(x, a,B)]. In other words, for any state x ∈ X , and any decision a ∈ A(x),

P [Xt+1 ∈ B | Xt = x,At = a] = Q[B | x, a] ≡ F [U(x, a,B)]

4. Let g(x, a) denote the expected single stage net reward if the process is in state x at time t, and decision

a ∈ A(x) is implemented. To give a specific example in terms of the costs mentioned in Section 1.1,

for any decision a and arc (i, j), let kij(a) denote the number of times that arc (i, j) is traversed by a

vehicle while executing decision a. Then,

g(x, a) ≡N∑

i=1

ri(di(a)) −∑(i,j)

cijkij(a) −N∑

i=1

hi(xi + di(a))

−N∑

i=1

EF

[pi

(max{Ui0 − (xi + di(a)), 0})]

where EF denotes expected value with respect to the probability distribution F of U0.

5. The objective is to maximize the expected total discounted value over an infinite horizon. The de-

cisions At are restricted such that At ∈ A(Xt) for each t, and At may depend only on the history

(X0, A0,X1, A1, . . . ,Xt) of the process up to time t, i.e., when the decision maker decides on a deci-

sion at time t, the decision maker does not know what is going to happen in the future. Let Π denote

the set of policies that depend only on the history of the process up to time t. Let α ∈ [0, 1) denote

the discount factor. Let V ∗(x) denote the optimal expected value given that the initial state is x, i.e.,

V ∗(x) ≡ supπ∈Π

Eπ

[ ∞∑t=0

αtg (Xt, At)

∣∣∣∣∣ X0 = x

](1)

A stationary deterministic policy π prescribes a decision π(x) ∈ A(x) based on the information contained in

the current state x of the process only. For any stationary deterministic policy π, and any state x ∈ X , the

expected value V π(x) is given by

V π(x) ≡ Eπ

[ ∞∑t=0

αtg (Xt, π(Xt))

∣∣∣∣∣ X0 = x

]

= g(x, π(x)) + α

∫X

V π(y) Q[dy | x, π(x)]

(The last equality is a standard result in dynamic programming; see for example Bertsekas and Shreve 1978.)

It follows from results in dynamic programming that, under conditions that are not very restrictive (e.g., g

bounded and α < 1), to determine the optimal expected value in (1), it is sufficient to restrict attention to

7

the class ΠSD of stationary deterministic policies. It follows that for any state x ∈ X ,

V ∗(x) = supπ∈ΠSD

V π(x)

= supa∈A(x)

{g(x, a) + α

∫X

V ∗(y) Q[dy | x, a]}

(2)

A policy π∗ is called optimal if V π∗= V ∗.

1.3 Solving the Markov Decision Process

Many algorithms have been proposed to solve Markov decision processes; for example, see the textbooks by

Bertsekas (1995) and Puterman (1994). Solving a Markov decision process usually involves computing the

optimal value function V ∗, and an optimal policy π∗, by solving the optimality equation (2). This requires

the following major computational tasks to be performed.

1. Computation of the optimal value function V ∗. Because V ∗ appears in the left hand side and right hand

side of (2), most algorithms for computing V ∗ involves the computation of successive approximations

to V ∗(x) for every x ∈ X . These algorithms are practical only if the state space X is small. For the

IRP as formulated in Section 1.2, X may be uncountable. One may attempt to make the problem more

tractable by discretizing the state space X and the transition probabilities Q. Even if one discretizes Xand Q, the number of states grows exponentially in the number of customers. Thus even for discretized

X and Q, the number of states is far too large to compute V ∗(x) for every x ∈ X if there are more

than about four customers.

2. Estimation of the expected value (integral) in (2). For the IRP, this is a high dimensional integral,

with the number of dimensions equal to the number N of customers, which can be as much as several

hundred. Conventional numerical integration methods are not practical for the computation of such

high dimensional integrals.

3. The maximization problem on the right hand side of (2) has to be solved to determine the optimal

decision for each state. In the case of the IRP, the optimization problem on the right hand side of (2)

is very hard. For example, the vehicle routing problem (VRP), which is NP-hard, is a special case

of that problem. (Consider any instance of the VRP, with a given number of capacitated vehicles, a

graph with costs on the arcs, and demand quantities at the nodes. For the IRP, let the vehicles and

graph be the same as for the VRP, let the demand be deterministic with demand quantities as given

for the VRP, let the current inventory level at each customer be zero, let the discount factor be zero,

and let the penalties be sufficiently large such that an optimal solution for the optimization problem

8

on the right hand side of (2) has to satisfy the demand quantities at all the nodes. Then the instance

of the VRP can be solved by solving the optimization problem on the right hand side of (2).)

In Kleywegt et al. (2002) we developed approximation methods to perform the computational tasks

mentioned above efficiently and to obtain good solutions for the inventory routing problem with direct

deliveries (IRPDD). To extend the approach to the IRP in which multiple customers can be visited on a

route, we develop in this paper new methods for the first and third computational tasks, that is, to compute,

at least approximately, V ∗, and to solve the maximization problem on the right hand side of (2). The second

task was addressed in the way described in Kleywegt et al. (2002).

1.4 Overview of the Proposed Method

An outline of our approach is as follows. The first major step in solving the IRP is to construct an ap-

proximation V to the optimal value function V ∗. The approximation V is constructed as follows. First, a

decomposition of the IRP is developed. Subproblems are defined for specific subsets of customers. Each

subproblem is also a Markov decision process. The subsets of customers do not necessarily partition the set

of customers, but must cover the set of customers. The idea is to define each subproblem so that it gives

an accurate representation of the overall process as experienced by the subset of customers. To do that, the

parameters of each subproblem are determined by simulating the overall IRP process, and by constructing

simulation estimates of subproblem parameters. Second, each subproblem is solved optimally. Third, for

any given state x of the IRP process, the approximate value V (x) is determined by choosing a collection of

subsets of customers that partitions the set of customers. Then V (x) is set equal to the sum of the optimal

value functions of the subproblems corresponding to the chosen collection of subsets at states corresponding

to x. The collection of subsets of customers is chosen to maximize V (x).

Details of the construction of V are given in Section 2. An outline of the value function approximation

algorithm is given in Algorithm 1.

Given V , the IRP process is controlled as follows. Whenever the state of the process is x, then a decision

π(x) is chosen that solves

maxa∈A(x)

{g(x, a) + α

∫X

V (y) Q[dy | x, a]}

(3)

which is the right hand side of the optimality equation (2) with V instead of V ∗. A method for problem (3)

is described in Section 3.

Algorithm 1 already indicates that the development of the approximating function V requires a lot of

computational effort. The effort is required to determine appropriate parameters for the subproblems and to

solve all the subproblems. This effort is required only once at the beginning of the control of the IRP process

9

Algorithm 1 Procedure for computing V and π.1. Start with an initial policy π0. Set i ← 0.

2. Simulate the IRP under policy π0 to estimate the subproblem parameters.

3. Solve the subproblems.

4. V is determined by the optimal value functions of the subproblems.

5. Policy π1 is defined by equation (4).

6. Repeat steps 7 through 11 for a chosen number of iterations, or until a convergence test is satisfied.

7. Increment i ← i + 1.

8. Simulate the IRP under policy πi to update the estimates of the subproblem parameters.

9. With the updated estimates of the subproblem parameters, solve the updated subproblems.

10. V is determined by the optimal value functions of the updated subproblems.

11. Policy πi+1 is given by equation (4).

(although, in practice, V may have to be changed if the parameters of the MDP change), so that a substantial

effort for this initial computational task seems to be acceptable. In contrast, once the approximating function

V has been constructed, only the daily problem (3) has to be solved at each stage of the IRP process, each

time for a given value of the state x. Because the daily problem has to be solved many times, it is important

that this computational task can be performed with relatively little effort.

1.5 Review of Related Literature

In this section we give a brief review of related literature on the inventory routing problem (Section 1.5.1)

and on dynamic programming approximations (Section 1.5.2). The review is not comprehensive.

1.5.1 Inventory Routing Literature

A large variety of deterministic and stochastic models of inventory routing problems have been formulated,

and a variety of heuristics and bounds have been produced. A classification of the inventory routing literature

is given in Kleywegt et al. (2002).

Bell et al. (1983) propose an integer program for the inventory routing problem at Air Products, a

producer of products such as liquid nitrogen. Dror, Ball, and Golden (1985), and Dror and Ball (1987)

construct a solution for a short-term planning period based on identifying, for each customer, the optimal

replenishment day t∗ and the expected increase in cost if the customer is visited on day t instead of t∗. An

integer program is then solved that assigns customers to a vehicle and a day, or just a day, that minimizes the

sum of these costs plus the transportation costs. Dror and Levy (1986) use a similar method to construct a

10

weekly schedule, and then apply node and arc exchanges to reduce costs in the planning period. Trudeau and

Dror (1992) apply similar ideas to the case in which inventories are observable only at delivery times. Bard

et al. (1998) follow a rolling horizon approach to an inventory routing problem with satellite facilities where

trucks can be refilled. To choose the customers to be visited during the next two weeks, they determine an

optimal replenishment frequency for each customer, similar to the approach in Dror, Ball, and Golden (1985),

and Dror and Ball (1987).

Federgruen and Zipkin (1984) formulate an inventory routing problem quite similar to the one in Sec-

tion 1.2, except that they focus on solving the myopic single-stage problem maxa∈A(x) g(x, a), which is a

nonlinear integer program. Golden, Assad, and Dahl (1984) also propose a heuristic to solve the myopic

single-stage problem maxa∈A(x) g(x, a), while maintaining an “adequate” inventory at all customers. Chien,

Balakrishnan, and Wong (1989) also propose an integer programming based heuristic to solve the single-stage

problem, but they attempt to find a solution that is less myopic than that of Federgruen and Zipkin (1984)

and Golden, Assad, and Dahl (1984), by passing information from one day to the next.

Anily and Federgruen (1990, 1991, 1993) analyze fixed partition policies for the inventory routing problem

with constant deterministic demand rates and an unlimited number of vehicles. They also find lower and

upper bounds on the minimum long-run average cost over all fixed partition policies, and propose a heuristic,

called modified circular regional partitioning, to choose a fixed partition. Gallego and Simchi-Levi (1990)

use an approach similar to that of Anily and Federgruen (1990) to evaluate the long-run effectiveness of

direct deliveries (one customer on each route). Bramel and Simchi-Levi (1995) also study fixed partition

policies for the deterministic inventory routing problem with an unlimited number of vehicles. They propose

a location based heuristic, based on the capacitated concentrator location problem (CCLP), to choose a fixed

partition. The tour through each subset of customers is constructed while solving the CCLP, using a nearest

insertion heuristic. Chan, Federgruen, and Simchi-Levi (1998) analyze zero-inventory ordering policies, in

which a customer’s inventory is replenished only when the customer’s inventory has been depleted, and fixed

partition policies, also for the deterministic inventory routing problem with an unlimited number of vehicles.

They derive asymptotic worst-case bounds on the performance of the policies. They also propose a heuristic

based on the CCLP, similar to that of Bramel and Simchi-Levi (1995), for determining a fixed partition

of the set of customers. Gaur and Fisher (2002) consider a deterministic inventory routing problem with

time varying demand. They propose a randomized heuristic to find a fixed partition policy with periodic

deliveries. Their method was implemented for a supermarket chain.

Burns et al. (1985) develop approximating equations for both a direct delivery policy as well as a policy

in which vehicles visit multiple customers on a route.

Minkoff (1993) also formulated the inventory routing problem as a MDP. He focused on the case with

an unlimited number of vehicles. He proposed a decomposition heuristic to reduce the computational effort.

11

The heuristic solves a linear program to allocate joint transportation costs to individual customers, and then

solves individual customer subproblems. The value functions of the subproblems are added to approximate

the value function of the combined problem. Minkoff’s work differs from ours in the following aspects:

(1) we consider the case with a limited number of vehicles, (2) we define subproblems involving one or more

customers, and the subproblems are defined differently, one reason being that the bound on the number of

vehicles has to be addressed in our subproblems, and (3) we solve an optimization problem to combine the

results of the subproblems.

Webb and Larson (1995) propose a solution for the problem of determining the minimum fleet size for

an inventory routing system. Their work is related to Larson’s earlier work on fleet sizing and inventory

routing (Larson, 1988).

Bassok and Ernst (1995) consider the problem of delivering multiple products to customers on a fixed

tour. The optimal policy for each product is characterized by a sequence of critical numbers, similar to an

optimal policy found by Topkis (1968). Barnes-Schuster and Bassok (1997) study the cost effectiveness of

a particular direct delivery policy for the inventory routing problem. Kleywegt et al. (2002) also consider

the special case with direct deliveries. A MDP model of the inventory routing problem is formulated, and a

dynamic programming approximation method is developed to find a policy.

Herer and Roundy (1997) propose several heuristics to construct power-of-two policies for the inventory

routing problem with constant deterministic demand rates and an unlimited number of vehicles, and they

prove performance bounds for the heuristics. Viswanathan and Mathur (1997) propose an insertion heuris-

tic to construct a power-of-two policy for the inventory routing problem with multiple products, constant

deterministic demand rates, and an unlimited number of vehicles.

Reiman et al. (1999) perform a heavy traffic analysis for three types of policies for the inventory routing

problem with a single vehicle.

Cetinkaya and Lee (2000) study a problem in which the vendor accumulates customer orders over time

intervals of length T , and then delivers customer orders at the end of each time interval.

Bertazzi et al. (2002) consider a deterministic inventory routing problem with a single capacitated vehicle.

Each customer has a specified minimum and maximum inventory level. They propose a heuristic to determine

the vehicle route at each discrete time point, while following an order-up-to policy, that is, each time a

customer is visited the inventory at the customer is replenished to the specified maximum inventory level.

They consider the impact of different objective functions.

The inventory pickup and delivery problem is quite similar to the inventory routing problem. In the

inventory pickup and delivery problem, there are multiple sources of a single product, multiple demand

points, and multiple vehicles. The vehicles are scheduled to travel alternatingly between sources and demand

points to replenish the inventory at the demand points. Christiansen and Nygreen (1998a, 1998b) present

12

a path flow formulation and column generation method for the inventory pickup and delivery problem with

time windows (IPDPTW). Christiansen (1999) presents an arc flow formulation for the IPDPTW.

1.5.2 Dynamic Programming Approximation Literature

Dynamic programming or Markov decision processes is a versatile and widely used framework for modeling

dynamic and stochastic optimal control problems. However, a major shortcoming is that for many interesting

applications an optimal policy cannot be computed because (1) the state space X is too big to compute and

store the optimal value V ∗(x) and an optimal decision π∗(x) for each state x; and/or (2) the expected value

in (2), which often is a high dimensional integral, cannot be computed exactly; and/or (3) the single stage

optimization problem on the right hand side of (2) cannot be solved exactly. In this section we briefly

mention some of the work that has been done to address the first issue, that is, how to attack problems with

large state spaces. The second issue makes up a large part of the field of statistics, and the third issue makes

up a large part of the field of optimization; these fields are not reviewed here.

A natural approach for attacking MDPs with large state spaces, which is also the approach used in this

paper, is to approximate the optimal value function V ∗ with an approximating function V . It is shown

in Section 2 that a good approximation V of the optimal value function V ∗ can be used to find a good

policy π. Some of the early work on this approach is that of Bellman and Dreyfus (1959), who propose

using Legendre polynomials inductively to approximate the optimal value function of a finite horizon MDP.

Chang (1966), Bellman et al. (1963), and Schweitzer and Seidman (1985) also study the approximation

of V ∗ with polynomials, especially orthogonal polynomials such as Legendre and Chebychev polynomials.

Approximations using splines are suggested by Daniel (1976), and approximations using regression splines

by Chen et al. (1999). Recently a lot of work has been done on parameterized approximations. Some of

this work was motivated by approaches proposed for reinforcement learning; Sutton and Barto (1998) give

an overview. Tsitsiklis and Van Roy (1996), Van Roy and Tsitsiklis (1996), Bertsekas and Tsitsiklis (1996),

and De Farias and Van Roy (2000) study the estimation of the parameters of these approximating functions

for infinite horizon discounted MDPs, and Tsitsiklis and Van Roy (1999a) consider estimation for long-run

average cost MDPs. Value function approximations are proposed for specific applications by Van Roy et al.

(1997), Powell and Carvalho (1998), Tsitsiklis and Van Roy (1999b), Secomandi (2000), and Kleywegt et al.

(2002).

In many models the state space is uncountable and the transition and cost functions are too complex

for closed form solutions to be obtained. Discretization methods and convergence results for such problems

are discussed in Wong (1970a), Fox (1973), Bertsekas (1975), Kushner (1990), Chow and Tsitsiklis (1991),

and Kushner and Dupuis (1992).

Another natural approach for attacking a large-scale MDP is to decompose the MDP into smaller related

13

MDPs, which are easier to solve, and then to use the solutions of the smaller MDPs to obtain a good solution

for the original MDP. Decomposition methods are discussed in Wong (1970b), Collins and Lew (1970), Collins

(1970), Collins and Angel (1971), Courtois (1977), Courtois and Semal (1984), Stewart (1984), and Kleywegt

et al. (2002).

Some general state space reduction methods that include many of the methods mentioned above are

analyzed in Whitt (1978, 1979a, 1979b), Hinderer (1976, 1978), Hinderer and Hubner (1977), and Haurie

and L’Ecuyer (1986). Surveys are given in Morin (1978), and Rogers et al. (1991).

2 Value Function Approximation

The first major step in solving the IRP is the construction of an approximation V to the optimal value

function V ∗. A good approximating function V can then be used to find a good policy π, in the sense

described next. Suppose that ‖V ∗ − V ‖∞ < ε, that is, V is an ε-approximation of V ∗. Also suppose that

stationary deterministic policy π satisfies

g(x, π(x)) + α∑y∈X

V (y) Q[y | x, π(x)] ≥ supa∈A(x)

g(x, a) + α

∑y∈X

V (y) Q[y | x, a]

− δ (4)

for all x ∈ X , that is, decision π(x) is within δ of the optimal decision using approximating function V on

the right hand side of the optimality equation (2). Then

V π(x) ≥ V ∗(x) − 2αε + δ

1 − α

for all x ∈ X , that is, the value function V π of policy π is within (2αε + δ)/(1 − α) of the optimal value

function V ∗. This observation is the motivation for putting in the effort to construct a good approximating

function V .

This section describes the construction of V ; the “decisions” referred to in this section are used only for

the purpose of motivating the approximation V , and are not used to control the IRP process. The decisions

used to control the IRP process are described subsequently in Section 3.

2.1 Subproblem Definition

To approximate the optimal value function V ∗, we decompose the IRP into subproblems, and then combine

the subproblem results using another optimization problem, described in Section 2.2, to produce the approx-

imating function V . Each subproblem is a Markov decision process involving a subset of customers. The

subsets of customers do not necessarily partition the set of customers, but must cover the set of customers,

14

and it must be possible to form a partition with a subcollection of the subsets. The approach we followed

was to define subproblems for each subset of customers that can be visited on a single vehicle route. Thus

each single customer forms a subset, and in addition there are a variety of subsets with multiple customers.

Hence, the cover and partition conditions referred to above are automatically satisfied.

After the subsets of customers have been identified, a subproblem has to be defined (a model has to be

constructed) for each subset. That involves determining appropriate parameters and parameter values for the

MDP of each subset. An appealing idea is to choose the parameters and parameter values of each subproblem

so that the subproblem represents the overall IRP process as experienced by the subset of customers. There

are several obstacles in the way of implementing such an idea. First, the overall process depends on the

policy controlling the process, and an optimal policy is not known. Second, even with a given policy for

controlling the overall process, it is still hard to determine appropriate parameters and parameter values for

each subproblem so that the combined subproblems give a good representation of the overall process.

This section, including Subsections 2.1.1 and 2.1.2, is devoted to the modeling of the subproblems, that

is, the determination of parameters and parameter values for each subproblem. It has the interesting feature

that simulation is used in the process of constructing the subproblem models. Issues that have to be addressed

are the following.

1. One question is how many vehicles are available for a given subproblem. This issue comes about

because in the overall IRP process, several subsets compete for the M vehicles, and thus, at any given

time, all M vehicles will not be available to any given subset. Also a vehicle may visit customers

in the subset as well as customers not in the subset, and thus not all of a vehicle’s capacity C may

be available to the given subset. Thus, the availability of vehicles and vehicle capacity to subsets of

customers (and therefore in subproblems) has to be modeled.

2. Transition probabilities have to be determined for the subproblems. The transition probabilities of the

inventory levels are determined by the demand distribution F as before. In addition, for the subprob-

lems we also address the transition probabilities of vehicle availability to the subset of customers.

In the description of the subproblems, we sometimes refer to the overall process, and sometimes to the

models of the individual subproblems; we attempt to keep the distinctions as well as the similarities clear. To

simplify notation, the modeling of the subproblems is described for a two-customer subproblem; the models

for the subproblems with one or more than two customers are similar.

A two-customer subproblem for subset {i, j} is denoted by MDPij . The method presented in this section

is for a discrete demand distribution F and a discrete state space X , which may come about naturally due

to the nature of the product or because of discretization of the demand distribution and the state space. Let

the support of F be denoted by U1 × · · · × UN , and let fij denote the (marginal) probability mass function

15

of the demand of customers i and j, that is, fij(ui, uj) ≡ F [U1 × U2 × · · · × {ui} × · · · × {uj} × · · · × UN ]

denotes the probability that the demand at customer i is ui and the demand at customer j is uj .

Recall that the idea is to define each subproblem so that it gives an accurate representation of the overall

process as experienced by the subset of customers. Clearly, the state of a subproblem has to include the

inventory level at each of the customers in the subproblem. Furthermore, to capture information about the

availability of vehicles for delivering to the customers in the subproblem, the state of a subproblem also

includes a component with information about the vehicle availability to the subset of customers.

To determine possible values of the vehicle availability component vij of the state of subproblem MDPij ,

consider the different ways in which the customers i and j can be visited in the overall IRP process. For

simplicity, we assume that each customer is visited at most once per day. Consequently, on any day, the

subset of two customers can be visited by 0, 1, or 2 vehicles. Hence, in subproblem MDPij , at any point in

time, either 0, or 1, or 2 vehicles are available to the subset of two customers. The simplest case is the case

with no vehicles available for delivering to customers i and j (denoted by vij = 0 in subproblem MDPij).

When 1 or 2 vehicles are available to the subset of two customers, we also have to specify how much of those

vehicles’ capacities are available to the subset of customers, because those same vehicles may also make

deliveries to customers other than i or j on a route. Consider the different ways in which one vehicle could

deliver to i and/or j in the overall IRP process. There are the following six possibilities:

1. exclusive delivery to i,

2. exclusive delivery to j,

3. exclusive delivery to i and j (no deliveries to other customers),

4. fraction of vehicle capacity delivered to i and no delivery to j,

5. fraction of vehicle capacity delivered to j and no delivery to i,

6. fraction of vehicle capacity delivered to i and j plus delivery to other customers.

The first three possibilities are represented by the same vehicle availability component in subproblem MDPij

(denoted by vij = a), because in all three cases one vehicle is available exclusively for customers in the

subproblem. The other possibilities are denoted by vij = b, c, d respectively, in subproblem MDPij . Next

consider the different ways in which two vehicles could deliver to i and j in the overall IRP process. There

are the following four possibilities:

1. exclusive delivery to i and j (no deliveries to other customers),

2. exclusive delivery to i, fraction of vehicle capacity delivered to j

16

3. exclusive delivery to j, fraction of vehicle capacity delivered to i

4. fraction of vehicle capacity delivered to i and fraction of vehicle capacity delivered to j (with different

vehicles visiting i and j, each also delivering to other customers).

These possibilities are denoted by vij = e, f, g, h respectively, in subproblem MDPij .

Whenever a vehicle is available for delivering a fraction of its capacity to one or both of the customers in

the subset, the model for subproblem MDPij also needs to specify what portion of the vehicle’s capacity is

available to the subset. For example, when the vehicle availability vij ∈ {b, c, d}, one vehicle with a fraction

of the capacity C is available to the two-customer subset; when vij = h, two vehicles, each with a fraction of

the capacity C, are available to the subset; and when vij ∈ {f, g}, two vehicles, one with capacity C and one

with a fraction of the capacity C, are available to the subset. Each of the subproblem vehicle availabilities

vij ∈ {b, g, h} correspond to a situation in the overall IRP in which a vehicle visits i and a customer not in

{i, j}, but the same vehicle does not visit j. The fractional capacity associated with the vehicle availabilities

vij ∈ {b, g} is the same and is denoted by λiij ∈ [0, C]. Similarly, the fractional capacity associated with the

vehicle availabilities vij ∈ {c, f} is the same and is denoted by λjij ∈ [0, C]. When the vehicle availability is

vij = h, one vehicle with fractional capacity λiij and another vehicle with fractional capacity λj

ij are available

to the subset. Finally, when the vehicle availability is vij = d, the fractional capacity available to the subset

is denoted by λijij ∈ [0, C]. Table 1 summarizes the vehicle availability values vij and associated available

capacities for a two-customer subproblem MDPij . Note that for the subproblem, it is sufficient to know

the (possibly fractional) capacities available to the subset. The subproblem decision determines how the

capacities will be used to serve customers i and j. Section 2.1.2 explains how simulation is used to choose

appropriate values for these λ-parameters.

Table 1: Vehicle availability values vij and associated capacities for a two-customer subproblem MDPij .

vij-value Vehicle capacities available to customer subset {i, j}0 Nonea One vehicle with capacity Cb One vehicle with capacity λi

ij

c One vehicle with capacity λjij

d One vehicle with capacity λijij

e Two vehicles, each with capacity C

f Two vehicles, one with capacity C, and one with capacity λjij

g Two vehicles, one with capacity λiij , and one with capacity C

h Two vehicles, one with capacity λiij , and one with capacity λj

ij

Each two-customer subproblem MDPij is a discrete time Markov decision process, and is defined as

follows.

17

1. The state space is Xij = {0, 1, . . . , Ci} × {0, 1, . . . , Cj} × {0, a, b, c, d, e, f, g, h}. State (xi, xj , vij)

denotes that the inventory levels at customers i and j are xi and xj , and the vehicle availability is vij .

Let Xit ∈ {0, 1, . . . , Ci} denote the random inventory level at customer i at time t, and let Vijt denote

the random vehicle availability at time t.

2. For any subproblem state (xi, xj , vij), let Aij(xi, xj , vij) denote the set of feasible subproblem deci-

sions when the subproblem process is in state (xi, xj , vij). A decision aij ∈ Aij(xi, xj , vij) contains

information about (1) which of customers i and j to replenish, (2) how much to deliver at each of

customers i and j, and (3) how to combine customers i and j into vehicle routes. (For a two-customer

subproblem, the routing aspect of the decision is easy.) Let di(aij) denote the quantity of product

that is delivered to customer i while executing decision aij . The feasible decisions aij ∈ Aij(xi, xj , vij)

satisfy the following constraints when the subproblem state is (xi, xj , vij). When the vehicle availabil-

ity is vij = 0, then no vehicles can be sent to customers i and j, and di(aij) = dj(aij) = 0. When

vij = a, then one vehicle can be sent to customers i and j, and di(aij)+dj(aij) ≤ C, xi +di(aij) ≤ Ci,

and xj + dj(aij) ≤ Cj . When vij = b, then one vehicle can be sent to customer i, no vehicle can

be sent to customer j, and di(aij) ≤ min{λiij , Ci − xi}, and dj(aij) = 0. Feasible decisions are de-

termined similarly if vij = c. When vij = d, then one vehicle can be sent to customers i and j, and

di(aij) + dj(aij) ≤ λijij , xi + di(aij) ≤ Ci, and xj + dj(aij) ≤ Cj . When vij = e, then one vehicle can

be sent to each of customers i and j, and di(aij) ≤ min{C, Ci − xi}, and dj(aij) ≤ min{C, Cj − xj}.When vij = f , then one vehicle can be sent to each of customers i and j, and di(aij) ≤ min{C, Ci−xi},and dj(aij) ≤ min{λj

ij , Cj − xj}. Feasible decisions are determined similarly if vij = g. Finally, when

vij = h, then both i and j can be visited by a vehicle each, and di(aij) ≤ min{λiij , Ci − xi}, and

dj(aij) ≤ min{λjij , Cj − xj}. As for the overall IRP, let the random variable Aijt ∈ Aij(Xit,Xjt, Vijt)

denote the decision chosen at time t.

3. The transition probabilities of the subproblems have to incorporate the probability distribution of

customer demands, as well as the probabilities of vehicle availabilities to the subset of customers.

Because we assume that the probability distribution fij of customer demands is known, the transition

probabilities of the inventory levels can be determined for the subproblems as for the overall IRP. In

the overall IRP process, the probabilities of vehicle availabilities to a subset of customers depend on

the policy used to control the process, and are not directly obtainable from the input data of the IRP.

Thus, some additional effort is required to make the transition probabilities of vehicle availabilities in

the subproblems representative of what happens in the overall IRP. The basic idea is described next,

and more details are provided in Section 2.1.1. Consider any policy π ∈ Π for the IRP with unique

stationary probability νπ(x) for each x ∈ X . (Thus, as indicated in Algorithm 1, the formulation

18

of the subproblems depends on the policy used to control the overall process. In each iteration of

Algorithm 1, a policy is chosen and a set of subproblems are defined and solved.) Similar to the nine

types of vehicle availability vij ∈ {0, a, b, c, d, e, f, g, h} for customers i and j in subproblem MDPij

identified above, the delivery actions for customers i and j of each decision a in the overall IRP process

can be classified as belonging to one of the above nine types. Let vij(a) ∈ {0, a, b, c, d, e, f, g, h} denote

the type of delivery action for customers i and j of decision a in the overall IRP process. Then, for the

overall IRP process under policy π, the conditional probability pij(wij |yi, yj) that the delivery action

for customers i and j is vij(a) = wij , given that the inventory levels at customers i and j are yi and

yj , is given by

pij(wij |yi, yj) =

∑{x∈X : xi=yi, xj=yj , vij(π(x))=wij}

νπ(x)

∑{x∈X : xi=yi, xj=yj}

νπ(x)(5)

if the denominator is positive, and pij(wij |yi, yj) = 0 if the denominator is 0. Suppose we know or have

estimates for the conditional probabilities pij(wij |yi, yj). (The estimation of pij(wij |yi, yj) is discussed

in Section 2.1.1.) Then the transition probabilities for subproblem MDPij (which are input data for

the subproblem) are given by

Pij

[(Xi,t+1,Xj,t+1, Vi,j,t+1) = (yi, yj , wij)

∣∣(Xit,Xjt, Vijt) = (xi, xj , vij), Aijt = aij

]

≡

fij(xi + di − yi, xj + dj − yj) pij(wij |yi, yj) if yi > 0, yj > 0∑∞ui=xi+di

fij(ui, xj + dj − yj) pij(wij |yi, yj) if yi = 0, yj > 0∑∞uj=xj+dj

fij(xi + di − yi, uj) pij(wij |yi, yj) if yi > 0, yj = 0∑∞ui=xi+di

∑∞uj=xj+dj

fij(ui, uj) pij(wij |yi, yj) if yi = 0, yj = 0

(6)

4. The costs for subproblem MDPij are the same as the costs involving customers i and j in the overall

problem. As for the overall IRP, for any subproblem decision aij and arc (m,n), let kmn(aij) denote

the number of times that arc (m,n) is traversed by a vehicle while executing decision aij . Also, node

0 denotes the vendor location. Then, continuing with the example costs introduced in Section 1.1, the

expected net reward per stage for subproblem MDPij , given state (xi, xj , vij) and decision aij , is given

by

gij(xi, xj , aij) ≡ [ri(di(aij)) + rj(dj(aij))

] − [c0ik0i(aij) + cijkij(aij) + cj0kj0(aij)

]− [

hi(xi + di(aij)) + hj(xj + dj(aij))]

19

− EF

[pi(max{Ui − (xi + di(aij)), 0}) + pj(max{Uj − (xj + dj(aij)), 0})] (7)

5. The objective is to maximize the expected total discounted value over an infinite horizon. Let

V ∗ij(xi, xj , vij) denote the optimal expected value of subproblem MDPij , given that the initial state is

(xi, xj , vij), i.e.,

V ∗ij(xi, xj , vij) ≡ sup

{Aijt}∞t=0

E

[ ∞∑t=0

αtgij(Xit,Xjt, Aijt)

∣∣∣∣∣ (Xi0,Xj0, Vij0) = (xi, xj , vij)

]

The decisions Aijt are constrained to be feasible and nonanticipatory.

The subproblem MDPij for each two-customer subset is relatively easy to solve using a dynamic pro-

gramming algorithm such as modified policy iteration (Puterman, 1994). Also note that the subproblems do

not have to be solved every day—these problems are solved initially when the value function approximation

V is developed.

Two issues related to the definition of the two-customer subproblems remain to be addressed. The first

issue concerns the determination of the conditional probabilities pij(wij |yi, yj), and the second issue involves

the determination of the parts λiij of the vehicle capacity that are available for delivery to customer i when

the vehicle also visits another customer k ∈ {i, j}. These two issues are addressed in the next two sections.

2.1.1 Determining Subproblem Transition Probabilities

Recall that conditional probabilities pij(wij |yi, yj) were used to specify the transition probabilities for sub-

problem MDPij . Computing the conditional probabilities pij(wij |yi, yj) using (5) is hard, because stationary

probabilities νπ(x) have to be computed for all x ∈ X . The conditional probabilities can be estimated by

simulation of the overall process under policy π. Let Nijt(yi, yj) denote the number of times that the inven-

tory levels at customers i and j have been yi and yj respectively by transition t of the simulation, and let

Nijt(yi, yj , wij) denote the number of times that the inventory levels at customers i and j have been yi and

yj respectively and the delivery action for customers i and j has been vij(a) = wij by transition t of the

simulation of the overall IRP process under policy π. That is, Nijt(yi, yj) and Nijt(yi, yj , wij) are updated

as follows:

Ni,j,t+1(yi, yj) =

Nijt(yi, yj) + 1 if Xit = yi and Xjt = yj

Nijt(yi, yj) otherwise

20

and

Ni,j,t+1(yi, yj , wij) =

Nijt(yi, yj , wij) + 1 if Xit = yi,Xjt = yj , and vij(π(Xt)) = wij

Nijt(yi, yj , wij) otherwise

Then

pijt(wij |yi, yj) ≡ Nijt(yi, yj , wij)Nijt(yi, yj)

gives an estimate of pij(wij |yi, yj) after t transitions of the simulation. Also, it is often easy to obtain

good prior estimates of the probabilities pij(wij |yi, yj). One can choose initial values Nij0(yi, yj) and

Nij0(yi, yj , wij) of the counters, such that∑

wijNij0(yi, yj , wij) = Nij0(yi, yj) for all (yi, yj) ∈ {0, 1, . . . , Ci}×

{0, 1, . . . , Cj}, and pij0(wij |yi, yj) ≡ Nij0(yi, yj , wij)/Nij0(yi, yj) is an initial estimate of pij(wij |yi, yj). It

follows from results for Markov chains (Meyn and Tweedie, 1993) that if the Markov chain under policy π

has a unique stationary probability distribution νπ, then, with probability 1, the estimates pijt(wij |yi, yj)

converge to pij(wij |yi, yj) as t → ∞ for all (yi, yj) such that∑

{x∈X : xi=yi, xj=yj} νπ(x) > 0.

2.1.2 Determining Available Vehicle Capacities

As mentioned in Section 2.1, for a subproblem MDPij , we have to specify the part λiij of the vehicle’s

capacity C that is available for delivery at customer i whenever a vehicle visits both customer i and another

customer k ∈ {i, j}, that is, whenever the vehicle availability variable vij ∈ {b, g, h}. Several ways to model

these partial vehicle capacities in the subproblems were investigated. As demonstrated in Section 5, good

results were obtained by modeling the λ parameters in the subproblems as follows.

Again, we consider a policy π ∈ Π for the overall IRP with unique stationary probability νπ(x) for each

x ∈ X . Let

λiij ≡

∑{x∈X : vij(π(x))∈{b,g,h}} νπ(x)di(π(x))∑

{x∈X : vij(π(x))∈{b,g,h}} νπ(x)(8)

if the denominator is positive, and λiij ≡ 0 if the denominator is 0. The λ parameters defined above can also

be estimated by simulation of the overall IRP process under policy π. Let λiijt denote the estimate of λi

ij

after t transitions of the simulation, where λiij0 denotes an initial estimate, such as C/2. Let N i

ijt denote the

number of times that the delivery action vij(π(Xs)) for customers i and j have been in {b, g, h} by transition

21

t of the simulation. That is, N iijt is updated as follows:

N ii,j,t+1 =

N iijt + 1 if vij(π(Xt)) ∈ {b, g, h}

N iijt otherwise

The initial value N iij0 is a weight, in units of number of observations, associated with the initial estimate

λiij0. Then the parameter estimates are updated as follows:

λii,j,t+1 =

N iijtλ

iijt + di(π(Xt))N i

ijt + 1if vij(π(Xt)) ∈ {b, g, h}

λiijt otherwise

As before, it can be shown that if the Markov chain under policy π is positive recurrent, then, with probability

1, the estimates λiijt converge to λi

ij as t → ∞ for all i and j such that∑

{x∈X : vij(π(x))∈{b,g,h}} νπ(x) > 0.

Parameters λjij and λij

ij are estimated in a similar way.

An even simpler approach, using

λi ≡∑

{x∈X : di(π(x))>0} νπ(x)di(π(x))∑{x∈X : di(π(x))>0} νπ(x)

(9)

also lead to good results. These quantities λi can also be estimated by simulation estimates λit.

We have covered the definition of two-customer subproblems at length. We hope that the main ideas

have been presented in sufficient detail to make it clear that the same ideas can be applied to subproblems

with one customer or with more than two customers.

2.2 Combining Subproblems

The next topic to be addressed is the calculation of the approximate value function V (x) at a given state

x, using the results from the subproblems. Recall that subproblems were formulated and solved for subsets

of customers, and that solving the subproblems produces optimal value functions for the subproblems. Let

N ≡ {1, . . . , N} denote the set of customer indices, and let 0 be the index of the vendor’s facility. Let S ⊂ 2N

denote the collection of subsets of the set N of customers for which subproblems were formulated and solved.

In particular, recall that for each customer i ∈ N , {i} ∈ S. Also, for each i ∈ N , let Si ≡ {S ∈ S : i ∈ S}denote the collection of all subsets in S that contain i. For any subset S ∈ S and any state x (vector

of inventory levels) of the overall process, let xS denote the subvector of x corresponding to S (vector of

inventory levels at the customers in S). Also, let vS denote the vehicle availability component of the state

for the subproblem MDPS for subset S, where, for example, vS = 0 denotes that no vehicle is currently

22

available for subset S, and vS = 1 denotes that one vehicle is currently available for subset S. Thus, solving

subproblem MDPS for S ∈ S produces optimal value function V ∗S (xS , vS).

Given a state x, the approximate value V (x) is given by the optimal objective value of the following

cardinality constrained partitioning problem.

V (x) = maxy

∑i∈N

V ∗i (xi, 0) yi0 +

∑S∈S

V ∗S (xS , 1) yS1 (10)

subject to yi0 +∑S∈Si

yS1 = 1 ∀ i ∈ N (11)

∑S∈S

yS1 ≤ M (12)

yi0 ∈ {0, 1} ∀ i ∈ N (13)

yS1 ∈ {0, 1} ∀ S ∈ S (14)

The cardinality constrained partitioning problem partitions the set N of customers into subsets, with each

subset S ⊂ N corresponding to a subproblem MDPS . Each subset S for which yS1 = 1 is allocated a

vehicle, and contributes value V ∗S (xS , 1) to the objective. Each customer i that is not in any subset that is

allocated a vehicle (yi0 = 1) contributes value V ∗i (xi, 0) to the objective. The first constraint requires that

each customer is either in a subset that is allocated a vehicle, or the customer is not in a subset that is

allocated a vehicle. The second constraint requires that at most M vehicles be allocated to subsets.

The cardinality constrained partitioning problem in general is NP-hard: even if each subset has no more

than three elements, the resulting cardinality constrained partitioning problem is NP-hard, because the 3-

Partition problem reduces to such a restricted cardinality constrained partitioning problem. However, for the

special case in which each subset has no more than two elements, the cardinality constrained partitioning

problem can be solved in polynomial time, by solving a maximum weight perfect matching problem, as

described in Section 4.

3 Choosing a Decision in a State

So far, we have described the approximation of the dynamic programming value function, that is, the first

major task in the list of major computational tasks for solving the IRP given in Section 1.3. As mentioned,

the second major task was addressed as described in Kleywegt et al. (2002). The third major task is the

solution of (3) for any given state x. In this section we address this step in the development of a solution

method for the Markov decision process model of the IRP.

Recall that the formulation and solution of the subproblems used in the construction of the approximating

function V has to be performed initially only, and not at every stage of the process. In contrast, problem (3)

23

has to be solved at each stage of the process, but at each stage it is solved only for the given current state

x of the process. It is therefore acceptable to spend a lot of computational effort on the formulation and

solution of the subproblems, but it is desirable to be able to solve daily problem (3) with relatively little

computational effort.

Given the current state, two types of decisions have to be made, namely which customers to visit on each

vehicle route, and how much to deliver at those customers. These decisions are related, because the value of

visiting a set of customers on the same route depends on the delivery quantities for the customers.

For instances with more than approximately four customers and two vehicles, solving the maximization

problem to optimality would require an unacceptable computational effort, and therefore the following three-

step local search heuristic was developed:

1. Construct an initial solution consisting of only direct delivery routes.

Next, the local search heuristic continues by moving to the best neighboring decision in each iteration

until no better neighboring decision can be found. A neighboring decision is formed by adding a

customer to an existing route and modifying the delivery quantities. Each iteration consists of Steps 2

and 3:

2. For each existing route, rank all the customers not on the route by an initial estimate of the value of

adding the customer to the route.

3. For each route, evaluate more accurately the value of adding to the route the customers not on the

route, starting with the most promising customers identified in Step 2 and working down the lists, and

stopping the list processing when the accurately evaluated values do not improve. Identify the one

customer and one route that lead to the maximum improvement, and add that customer to the route.

Step 2 in the heuristic outlined above can be omitted, and is introduced only for efficiency, because accurately

evaluating a decision a involves computing

V ′(x, a) ≡ g(x, a) + α

∫X

V (y)Q[dy|x, a] (15)

where x denotes the current state, which is very time consuming. A more detailed description of each of

these steps is given below, followed by a statement of the algorithm.

3.1 Step 1: Choosing Direct-Delivery Routes

It is easy to see that a greedy procedure that chooses routes one at a time to optimize the objective function

in (3), could lead to bad decisions. For example, suppose that two vehicles are available and there are two

customers that urgently need deliveries, but the transportation cost between these two customers is quite

24

large. A greedy procedure may combine both these customers in the route that is chosen first, because of

their urgency (and its impact on the penalty cost in g(x, a) as well as the value function V (y) at the next

state y), and then combine other customers in the second route. A better decision may be to combine one

urgent customer with some nearby customers in one route, and the other urgent customer with some other

nearby customers in the other route.

The proposed heuristic avoids the pitfall described above by using a direct delivery solution as a starting

point for a local improvement procedure. Specifically, in Step 1 customers are assigned to vehicles using the

algorithm proposed in Kleywegt et al. (2002) for the inventory routing problem with direct deliveries. After

Step 1 has been completed, each vehicle visits at most one customer.

As a route can visit more than one customer, better decisions may be obtained by modifying the direct

delivery routes obtained in Step 1. In Steps 2 and 3, the routes are grown in the local search heuristic by

including more customers in the routes, as described next.

3.2 Step 2: Ranking Customers to be Added to Routes

An improvement heuristic explores whether a local modification of the current decision leads to a neighboring

decision that is better than the current decision, and, if so, adopts the better decision and repeats.

In our case, the local modifications considered are moving a customer from one route to another, adding

a customer which has not been included in a route yet to one of the routes, and changing the delivery

quantities.

As mentioned above, given state x, a decision a can be evaluated by computing V ′(x, a) as in (15).

However, evaluating all neighboring decisions a as in (15) and then moving to the best neighboring decision,

requires a prohibitively large amount of computational effort for the following reasons. For each of the M

vehicle routes, one has to choose among Θ(N) customers to be added to the route, and for each of the

resulting Θ(MN) new routes, a large number of delivery quantity combinations are possible, and thus the

number of neighboring decisions can be large. Also, computing the value V ′(x, a) of a neighboring decision

a as in (15) can be very time-consuming for instances with many customers, because of the high dimensional

integral (and thus, in the case of a discrete distribution, the large number of terms in the sum), and the

effort required to compute V (y) for each state y that can be reached from the current state x with decision

a. These considerations motivate one to find a method to first identify promising neighboring decisions with

little computational effort, and thereafter to evaluate only the most promising decisions in more detail. Such

a method is described next.

For each of the routes in the current decision, we consider each customer that can be added to the

route. The new set of customers in the modified route should be a set of customers that can be visited by

a single vehicle, and thus should correspond to an MDP subproblem such as those defined in Section 2.1.

25

To obtain an initial indication of the value of adding a customer to a current route, we use the optimal

delivery quantities from the subproblem for the resulting set of customers, with the state of the subproblem

given by the inventory levels and an availability of one vehicle to the set of customers. For each of the M

vehicle routes, we choose among the Θ(N) customers that can be added to the route, and thus the number

of neighboring decisions has been reduced to Θ(MN).

In the expression (15) for the objective value V ′(x, a) of a neighboring decision a, the single-stage value

g(x, a) can be computed quickly, whereas the expected future value∫X V (y)Q[dy|x, a] is much harder to

compute. Also, we observed in empirical studies with the IRP that the decision with the highest single-stage

value g(x, a) often also has the highest value of V ′(x, a) among all the feasible neighboring decisions. Hence,

g(x, a) seems to give a good indication of whether it is worth exploring a neighboring decision a in more

detail.

Thus, given the current state x and current decision, for each of the routes in the current decision, and

each customer that can be added to the route, a corresponding neighboring decision a and value g(x, a) have

been identified. Next, for each of the routes in the current decision, the customers that can be added to the

route are ranked according to the corresponding values g(x, a). Let j(m, i) denote the customer with the ith

largest value of g(x, a) that can be added to route m in the current decision. The output of Step 2 is this

ranking, that is, the set of indices j(m, i).

3.3 Step 3: Forming Routes Based on Total Expected Value

In Step 3 we decide which neighboring decision to move to (if any) before returning to Step 2. An outline

of Step 3 is as follows. We compute the total expected value V ′(x, a) resulting from adding those customers

to the current routes which obtained the highest values in Step 2. Then we move to a neighboring decision

by adding the customer to the current route which leads to the best value of V ′(x, a) and return to Step 2,

if such a value is better than the value of the current decision; otherwise the procedure terminates with the

current decision. Next we describe Step 3 in more detail.

Recall that the delivery quantities used in Step 2 were optimal for the subproblems, but may not be

good for the overall problem. In Step 3, we choose the delivery quantities (which determine the decision

a) at the customers more carefully, and compute the total expected value V ′(x, a) resulting from adding a

customer to a route and using the delivery quantities. To choose the delivery quantities in Step 3, we use the

following local search method. The set of routes is given. Consider the delivery quantities for one route at

a time. For any given route and any given vector of delivery quantities for the route, the neighboring set of

delivery quantities for the route consists of all the delivery quantity vectors obtained by one of the following

four steps: (1) decrease one component of the given vector of delivery quantities by one unit (as long as the

component is positive), and increase another component by the same amount, that is, swap a unit of delivery

26

between two customers on the given route; (2) increase the delivery quantity at one customer on the given

route by one unit if the vehicle capacity and the customer capacity allow such an increase; (3) decrease the

delivery quantity at one customer on the given route by one unit if the delivery quantity at that customer

in the given vector is positive; (4) the given vector of delivery quantities is left unchanged (the null step).

We start the local search at two solutions: (1) the vector of delivery quantities used in Step 2 (the optimal

delivery quantities of the subproblems), and (2) the vector of delivery quantities in the current decision. At

each iteration of the local search, we consider each of the given routes, and for each route, the vector of

delivery quantities is changed to the vector of delivery quantities in its neighborhood with the best value

V ′(x, a). The local search is terminated when a local optimum is found, that is, when no change in delivery

quantities takes place during an iteration.

The local search method for the calculation of the delivery quantities for a given set of routes is used as a

subroutine for choosing the customer not yet included in a route to be included. The combination of chosen

routes with the additional customer and corresponding local optimal delivery quantities defines the chosen

neighboring decision moved to next. Let a(m, i) denote the neighboring solution with local optimal delivery

quantities when adding customer j(m, i) to route m.

Next we describe how the routes in the next solution are chosen. For each route m in the current

decision, we successively compute the total expected value V ′(x, a) resulting from adding the next highest

valued customer to route m. That is, we first compute V ′(x, a(m, 1)) resulting from adding customer j(m, 1)

to route m. Then we compute V ′(x, a(m, 2)) resulting from adding customer j(m, 2) (but not customer

j(m, 1)) to route m. If V ′(x, a(m, 2)) ≥ V ′(x, a(m, 1)), then we do the same for customer j(m, 3), otherwise

we continue with another route. Thus the computation for route m is stopped when we reach a customer

j(m, i) for which the total expected value V ′(x, a(m, i)) is worse than the value V ′(x, a(m, i−1)) for customer

j(m, i − 1), i.e., V ′(x, a(m, i)) < V ′(x, a(m, i − 1)). Due to the preliminary ranking in Step 2, this usually

happened in computational tests when i = 2. After these computations have been completed for all routes

m in the current decision, the neighboring decision a∗ that provides the best total expected value V ′(x, a∗)

is determined. If the obtained value V ′(x, a∗) is better than the total expected value V ′(x, a′) of the current

decision a′, then a∗ becomes the new current decision, and the procedure returns to Step 2; otherwise the

procedure stops with a′ as the chosen decision. The procedure also stops if no more customers can be added

to any routes.

As mentioned before, in the expression (15) for V ′(x, a), the expected future value∫X V (y)Q[dy|x, a] is a

high dimensional integral if there are a large number of customers (and thus is the sum of a large number of

terms if the distribution is discrete and the demand of each customer can take on several values). As pointed

out in Kleywegt et al. (2002), if the random vector is high dimensional, then it is usually more efficient to

estimate the expected value with random sampling. Related issues to be addressed are (1) how large the

27

sample size should be, and (2) what performance guarantees can be obtained if random sampling is used

to choose the best decision. To address these issues, we used a ranking and selection method based on the

work of Nelson and Matejcik (1995). We also used variance reduction techniques, such as common random

numbers and experimental designs such as orthogonal arrays, to reduce the sample size needed for a specified

level of accuracy. Additional details are given in Kleywegt et al. (2002).

Algorithm 2 gives an overview of the steps in the procedure to choose a decision for a given state. Recall

that Algorithm 1 is executed only once initially, whereas Algorithm 2 is executed at each stage of the process,

each time for the given current state of the process.

4 A Special Case—Each Subset at Most Two Customers

In Section 2.2 it was shown how the subproblem results are combined to calculate the approximate value

V (x) for any given state x, by solving a cardinality constrained partitioning problem. It was also mentioned

that, in the special case in which each subset has no more than two elements, the cardinality constrained

partitioning problem can be solved in polynomial time, by solving a maximum weight perfect matching

problem. (In the application that motivated this research, most vehicle routes visit at most two customers,

and thus each subset has no more than two elements.)

In this section, we show that, in this special case, the cardinality constrained partitioning problem

(subsequently called the partitioning problem) can be solved in polynomial time, by solving a maximum

weight perfect matching problem. Specifically, the maximum weight perfect matching problem can be solved

in O(n2m) time with Edmonds’ (1965a,1965b) algorithm, where n is the number of nodes and m is the number

of arcs in the graph, or in O(n(m + n log n)) time with Gabow’s (1990) algorithm. In our computational

work, we used the Blossom IV implementation described in Cook and Rohe (1998). In the construction

explained next, n = 4N + 2M , and m = |N 2| + N + M + 2N(2N + 2M).

We describe the maximum weight perfect matching problem (subsequently called the matching problem)

by describing the corresponding graph G = (V, E). Let N 2 ≡ {S ∈ S : |S| = 2} denote the collection of

subsets in S of cardinality 2. There are four subsets of nodes, V ≡ V1 ∪ V2 ∪ V3 ∪ V4, and four subsets of

edges, E ≡ E1∪E2∪E3∪E4. Nodes in V1 represent customers, V1 ≡ {11, . . . , i1, . . . , N1}, and for each pair of

customers {i, j} ∈ N 2, there is an edge (i1, j1) ∈ E1 with value V ∗ij(xi, xj , 1). For each customer i ∈ N , there

is also a node i2 ∈ V2, and an edge (i1, i2) ∈ E2 with value V ∗i (xi, 1). Choosing an edge (i1, j1) ∈ E1 represents

assigning a vehicle to subset {i, j} ∈ N 2, (for the purpose of computing V (x)), and choosing an edge (i1, i2) ∈E2 represents assigning a vehicle to customer i by itself. Vehicles can also be left idle. To capture that, there

are 2M nodes, V3 ≡ {13, . . . , (2M)3}, and M edges, E3 ≡ {(13, 23), (33, 43), . . . , ((2M − 1)3, (2M)3)}, each

with value 0. (It follows from the definitions of the subproblems that V ∗i (xi, 1) ≥ V ∗

i (xi, 0) for all i and xi,

28

Algorithm 2 Choosing a Decision in a Given State x

Step 1:

Compute direct-delivery routes and delivery quantities using the algorithm in Kleywegt et al. (2002).Set current decision a′ equal to the resulting decision.

Step 2:

if no more customers can be added to any routes then

Stop with current decision a′ as the chosen decision.end if

for each route m in the current decision a′ do

for each customer j that can be added to route m do

Add customer j to route m.Use the optimal delivery quantities from the subproblem corresponding to the resulting route todetermine the neighboring decision a.Compute the single-stage value g(x, a).Remove customer j from route m.

end for

Sort the customers that can be added to route m, in decreasing order of the single-stage values g(x, a),to obtain a sorted list of customers j(m, 1), j(m, 2), . . . for route m.

end for

Step 3:

for each route m in the current decision a′ do

Set i ← 1.if no customers can be added to route m then

Continue with the next route m in the current decision a′.end if

Add customer j(m, i) to route m.Choose the delivery quantities using local search to determine the decision a(m, i).Remove customer j(m, i) from route m.repeat

if no more customers can be added to route m then

Break out of the repeat loop.end if

Increment i ← i + 1.Add customer j(m, i) to route m.Choose the delivery quantities using local search to determine the decision a(m, i).Remove customer j(m, i) from route m.

until V ′(x, a(m, i)) < V ′(x, a(m, i − 1)).end for

Let m∗ be the route, j∗ be the added customer, and a∗ be the decision with the best value of V ′(x, a(m, i)).if V ′(x, a∗) > V ′(x, a′) then

Add customer j∗ to route m∗, and set a′ ← a∗ as the new current decision.Go to Step 2.

else

Stop with current decision a′ as the chosen decision.end if

29

and thus if N ≥ 2M−1, then there is always an optimal solution of the partitioning problem such that all the

vehicles are assigned, that is,∑

S∈S y∗S1 = M . In such a case, there is no need for any nodes in V3 or any edges

in E3. This was the case in the motivating application.) Figure 1 shows the vertices in V1 ∪ V2 ∪ V3 and the

edges in E1∪E2∪E3 of the matching graph G = (V, E) = (V1∪V2∪V3∪V4, E1∪E2∪E3∪E4) for an example with

N = 3 customers and M = 2 vehicles. In the example, V1 = {11, 21, 31}, E1 = {(11, 21), (11, 31), (21, 31)},V2 = {12, 22, 32}, E2 = {(11, 12), (21, 22), (31, 32)}, V3 = {13, 23, 33, 43}, and E3 = {(13, 23), (33, 43)}. The

nonzero edge values are also shown in the figure.

Thus so far there are |V1| + |V2| + |V3| = 2N + 2M nodes. The assignment of M vehicles is to be

represented by the matching of 2M nodes. To match the remaining 2N nodes, there are 2N additional

nodes, V4 ≡ {14, . . . , (2N)4}, and (2N)(2N +2M) edges, E4 ≡ E14 ∪E2

4 ∪E34 , where Ek

4 ≡ Vk ×V4. Each edge

(i1, j4) ∈ E14 has value V ∗

ii(xi, xi, 0), and each edge in E24 and E3

4 has value 0. (The number of edges can be

reduced, for example by having only edges between odd numbered nodes in V3 and odd numbered nodes in

V4, and between even numbered nodes in V3 and even numbered nodes in V4.)

Figure 2 shows the vertices in V1 ∪ V2 ∪ V3 ∪ V4 and the edges in E4 of the matching graph G = (V, E) =

(V1 ∪ V2 ∪ V3 ∪ V4, E1 ∪ E2 ∪ E3 ∪ E4) for an example with N = 3 customers and M = 2 vehicles. In the

example, V4 = {14, . . . , 64}. The nonzero edge values shown in the figure are as follows: x = V ∗1 (x1, 0),

y = V ∗2 (x2, 0), and z = V ∗

3 (x3, 0). Edges without values shown have value 0.

Proposition 1. The partitioning and matching problems described above are equivalent. That is, for any

feasible solution of the partitioning problem (10)–(14), there is a feasible solution of the matching problem

on the graph G described above with the same objective value, and for any feasible solution of the matching

problem, there is a feasible solution of the partitioning problem with the same objective value.

Proof. Consider any feasible solution of the partitioning problem. We select edges one at a time, while

maintaining feasibility of the matching, until a perfect matching has been constructed. Start with no edges

in E selected. First, list the subsets S ∈ S in any sequence. We claim that edges in E can be selected

according to the following cases for each S in the list, while maintaining feasibility of the matching.

Case 1.1: If S = {i, j}, i = j, and yS1 = 1, then any two unmatched nodes k4 and l4 in V4 are picked, and

edges (i1, j1), (i2, k4) and (j2, l4) are selected.

Case 1.2: If S = {i, j}, i = j, and yS1 = 0, then no edges are selected for this S.

Case 2.1: If S = {i} and yS1 = 1, then edge (i1, i2) is selected.

Case 2.2: If S = {i} and yi0 = 1, then any two unmatched nodes k4 and l4 in V4 are picked, and the

corresponding edges (i1, k4) and (i2, l4) are selected.

Case 2.3: If S = {i}, yS1 = 0, and yi0 = 0, then no edges are selected for this S.

30

11

21 31

32

V*

13 (x1 ,x

3 ,1)

V*23(x2,x3,1)

V* 12

(x 1,x 2

,1)

V*

1(x1,1)

V *3 (x

3 ,1)

V* 2

(x 2,1)

13 23

33 43

12

22

Figure 1: Part (V1∪V2∪V3, E1∪E2∪E3) of the matching graph G = (V, E) = (V1∪V2∪V3∪V4, E1∪E2∪E3∪E4)for an example with N = 3 customers and M = 2 vehicles.

31

13

23

33

43

14

24

34

44

54

64

11 21 31 3212 22

x xx

xx x

yy

y

y yy

zz

z

z

z

z

Figure 2: Part (V1∪V2∪V3∪V4, E4) of the matching graph G = (V, E) = (V1∪V2∪V3∪V4, E1∪E2∪E3∪E4)for an example with N = 3 customers and M = 2 vehicles. Nonzero edge values are as follows: x = V ∗

1 (x1, 0),y = V ∗

2 (x2, 0), and z = V ∗3 (x3, 0).

32

Note that (11) excludes the case with S = {i}, yS1 = 1, and yi0 = 1. Thus exactly one of the cases above

holds for each S ∈ S. It follows from the construction of G that all the edges selected in the cases above are

in E . (Recall that E4 ≡ E14 ∪ E2

4 ∪ E34 , where Ek

4 ≡ Vk × V4. Thus, as long as there are a sufficient number of

nodes in V4, the nodes in V4 can be matched as described in the cases above.) To justify the claim, we need

to show that for each S, there are sufficient unmatched nodes in V4, and that feasibility of the matching is

maintained. Next we show that there are a sufficient number of nodes in V4. After all subsets S ∈ S have

been processed, the number of nodes in V4 that have been picked is 2∑

{S∈N 2} yS1 + 2∑

i∈N yi0. Note that∑i∈N

∑S∈Si

yS1 = 2∑

{S∈N 2} yS1 +∑

i∈N yi1. Hence, by adding constraint (11) over all i ∈ N , it follows

that

∑i∈N

yi0 + 2∑

{S∈N 2}yS1 +

∑i∈N

yi1 = N (16)

and thus

∑i∈N

yi0 + 2∑

{S∈N 2}yS1 ≤ N (17)

and

∑i∈N

yi0 ≤ N (18)

By adding (17) and (18) it follows that the number of nodes in V4 that have been picked is less than or

equal to 2N , which is the number of nodes in V4. Next it is shown that the matching constructed so far

is feasible. In fact, so far each node in V1 and V2 has been matched with exactly one other node, because

constraint (11) implies that for each i ∈ N , exactly one of the following holds: (1) Case 1.1 for one S ∈ Si,

or (2) Case 2.1, or (3) Case 2.2. In either case, each of nodes i1 and i2 is matched with exactly one node.

Also, so far none of the nodes in V3 has been matched, and each node in V4 has been matched with at most

one other node. Next we continue the construction of the perfect matching. The number of unassigned

vehicles is M − ∑S∈S yS1. Thus any M − ∑

S∈S yS1 edges in E3 are selected. By the definition of E3, these

edges have no nodes in common. Hence the number of unmatched nodes in V3 is 2∑

S∈S yS1. The number

of unmatched nodes in V4 is

2N − 2∑

{S∈N 2}yS1 − 2

∑i∈N

yi0 = 2∑

{S∈N 2}yS1 + 2

∑i∈N

yi1 = 2∑S∈S

yS1

where the first equality follows from (16). That is, the number of unmatched nodes in V3 is equal to the

33

number of unmatched nodes in V4. Now each unmatched node in V3 is matched with an unmatched node

in V4 by selecting an edge in E34 ≡ V3 × V4. The construction of the perfect matching is complete. It is

easily checked that the objective value of the perfect matching is the same as that of the given partitioning

solution, because only edges incident to nodes in V1 have nonzero values.

Conversely, consider any feasible solution of the matching problem. We construct a feasible solution of

the partitioning problem with the same objective value, as follows. For each node i1 ∈ V1, exactly one of

the following three cases holds.

Case a: If an edge (i1, j1) ∈ E1 (i = j) is selected, then set yS1 = 1 for S = {i, j}.

Case b: If an edge (i1, i2) ∈ E2 is selected, then set yS1 = 1 for S = {i}.

Case c: If an edge (i1, k4) ∈ E14 is selected, then set yi0 = 1.

All other decision variables of the partitioning problem are set to 0. It follows that constraint (11) is satisfied.

Let M ′ denote the number of edges in E3 that are selected, matching 2M ′ nodes in V3. The remaining

2M −2M ′ nodes in V3 have to be matched with nodes in V4. Thus the remaining 2N − (2M −2M ′) nodes in

V4 have to be matched with nodes in V1 ∪V2. Hence 2N − [2N − (2M −2M ′)] = 2M −2M ′ nodes in V1 ∪V2

are matched with each other, setting M −M ′ variables yS1 equal to 1. Thus∑

S∈S yS1 = M −M ′ ≤ M , and

constraint (12) is satisfied. It is again easily checked that the objective value of the resulting partitioning

solution is the same as that of the matching.

5 Computational Results

In this section, we discuss a number of experiments conducted to assess the efficacy and study the computa-

tional behavior of the dynamic programming approximation method. More specifically, the purpose of the

experiments are

1. to evaluate the quality of the policies produced by the dynamic programming approximation method,

2. to analyze the impact of various problem characteristics, such as the number of customers, the number

of vehicles, and the coefficients of variation of customer usage, on the quality of the policies produced,

and

3. to measure the computational requirements of the proposed method.

All instances used for the computational tests are given in the appendix.

First we describe the performance measures used to evaluate and present the qualities of policies.

34

5.1 Evaluating Policies and Comparing Value Functions

We evaluate policies by comparing their value functions with the optimal value functions for small instances

(for which the optimal value functions can be computed to within a small tolerance of optimality with

reasonable computational effort) and with the value functions of competing policies for larger instances.

However, it is difficult to present the quality of a policy π in a concise way, because it involves comparing

the value function V π(x) of policy π either with the optimal value function V ∗(x) or with the value function

V π(x) of a competing policy π over all states x.

Therefore, for small instances, we have chosen to compare the average value of policy π over all states

with the average optimal value over all states. That is, V πavg ≡ ∑

x∈X V π(x)/|X | is compared with V ∗avg ≡∑

x∈X V ∗(x)/|X |. Also, since we realize that averaging over all states may smooth out irregularities, we

augment this comparison with a comparison of the minimum and maximum values over all states. That is,

V πmin ≡ minx∈X V π(x), and V π

max ≡ maxx∈X V π(x) are compared with V ∗min ≡ minx∈X V ∗(x), and V ∗

max ≡maxx∈X V ∗(x). In addition to presenting statistics of the actual values of the policies as described above,

we also present the values of these policies relative to the optimal values. To eliminate the effect of negative

optimal values, or values in the denominator close to zero, we shift all the values to fix the minimum value of

the shifted optimal value function at 1. Specifically, let m ≡ minx∈XV ∗(x) and for any stationary policy π, let

ρπ(x) ≡ [V π(x)−m + 1]/[V ∗(x)−m + 1]. For each policy π evaluated, we present ρπavg ≡ ∑

x∈X ρπ(x)/|X |,ρπmin ≡ minx∈X ρπ(x), and ρπ

max ≡ maxx∈X ρπ(x). For small instances, the value function V π(x) of each

policy π is computed using the Gauss-Seidel policy evaluation algorithm (see, for example, Puterman 1994).

For larger instances, the Gauss-Seidel policy evaluation algorithm is not useful for computing the value

functions of policies, because the number of states becomes too large, and hence the available computer

memory is not sufficient to store the values of all the states, and the computation time also becomes ex-

cessive. For the same reasons, the optimal value functions cannot be computed for larger instances. In the

absence of optimal values, we compare our dynamic programming approximation policy (KNS), presented

in Algorithm 2, with the following two policies. The first competing policy is a slightly modified version

(to account for additional terms in our objective) of the policy proposed by Chien et al. (1989) (CBW), as

described in Kleywegt et al. (2002). The second competing policy is a myopic policy (Myopic) that takes only

the single-stage costs into account, i.e., the policy obtained by using value function approximation V = 0 or

discount factor α = 0. The policies were evaluated by randomly choosing five initial states, and then simulat-

ing the processes under each of the different policies starting from the chosen initial states. Six sample paths

were generated for each combination of policy and initial state, for each problem instance. Each replication

produced a sample path over a relatively long but finite time horizon of 800 time periods; each resulting in

a total discounted reward. The length of the time horizon was chosen to bound the discounted truncation

error to less than 0.01 (approximately 0.1%). The sample means µ and standard deviations σ of the sample

35

means of the total discounted rewards over the six sample paths, as well as intervals (µ − 2σ, µ + 2σ), are

presented.

5.2 Policy Quality with Small Instances

In this section we describe the results of computational tests performed to evaluate the quality of our proposed

policy KNS for small instances. At the same time, we address an issue encountered during the development

of the value function approximation regarding how to capture the interactions between the customer(s) in a

subproblem and the remaining customers. One aspect of this interaction is the fact that when a vehicle visits

both a customer i in a subproblem MDPij and a customer k not in the subproblem, then less than the full

vehicle capacity is available for delivery at the customer in the subproblem. As described in Section 2.1, this

interaction is captured by the partial vehicle capacities λiij available to customer i in subproblem MDPij .

Appropriate values for λiij can be estimated with (8) or (9). A relevant question is how sensitive the resulting

policy is with respect to the estimates of λiij . In the first set of experiments, we compare the effect of using

a simple estimate λiij = �0.5CV � (policy π1) to that of using an estimate obtained using (9) and simulation

(policy π2), on the solution quality. For both policies, the expected values in (4) are computed exactly, and

the decision in each state is chosen by evaluating all feasible decisions in that state.

We compare the value functions of policies π1 and π2 with the optimal value function for small but

nontrivial instances of the IRP. These comparisons are given in Tables 2 and 3.

Table 2: Comparison of the values of policies that use different estimates of the partial vehicle availabilitiesλi

ij , with the optimal values.

Instance V ∗min V ∗

avg V ∗max V π1

min V π1avg V π1

max V π2min V π2

avg V π2max

topt1 66.77 68.13 69.21 66.42 67.84 68.83 66.58 68.09 69.05topt2 66.62 69.19 70.63 65.96 68.65 70.00 66.43 69.12 70.51topt3 22.93 27.17 29.78 22.17 26.53 28.98 22.74 27.10 29.63topt4 148.02 153.19 156.42 145.68 151.34 154.53 147.27 152.90 155.71

Table 3: Comparison of the ratios of the values of policies that use different estimates of the partial vehicleavailabilities λi

ij , relative to the optimal values.

Instance ρπ1min ρπ1

avg ρπ1max ρπ2

min ρπ2avg ρπ2

max

topt1 0.957 0.973 0.980 0.969 0.988 0.995topt2 0.952 0.963 0.984 0.965 0.985 0.994topt3 0.961 0.971 0.976 0.967 0.988 0.994topt4 0.950 0.962 0.973 0.968 0.984 0.990

When we look at the results in Tables 2 and 3, we observe that the values of the two policies are very

36

close to the optimal values, which indicates that our overall approach provides good policies. Furthermore,

the results also reveal that using (9) and simulation to estimate λiij provides a better policy than using a

crude estimate, at the cost of only a small increase in computation time. Hence, we used (9) and simulation

to estimate λiij in the experiments discussed in the remainder of this section.

5.3 Policy Quality with Larger Instances

In this section we describe the results of computational tests performed to evaluate the quality of our proposed

policy KNS for larger instances. In Section 5.3.1, we first focus on the special case in which delivery routes

visit at most two customers. In that situation, it suffices to consider subproblems of at most two customers

to approximate the value function, and, as discussed in Section 4, calculating the approximate value function

at a given state can be done in polynomial time by solving a maximum weight perfect matching problem.

Thereafter, in Section 5.3.2, we compare the objective value of the KNS policy for the case in which delivery

routes visit at most two customers with the case in which delivery routes visit up to three customers.

5.3.1 At Most Two Customers Per Route

We conducted three experiments with each of the three policies KNS, CBW, and Myopic. In each of these

experiments, we varied a single instance characteristic and observed the impact on the performance of the

policies. The three instance characteristics varied are (1) the number of customers, (2) the number of vehicles,

and (3) the coefficient of variation of customer demand.

To study the impact of the number of customers on the performance of the policies, the instances were

generated so that larger instances have more customers with the same characteristics as the smaller instances.

Hence, customer characteristics as well as the ratio of delivery capacity to total expected demand were kept

the same for all instances. Table 4 shows the performance of the policies on instances with varying numbers of

customers. The results clearly demonstrate that the KNS policy consistently outperforms the other policies.

Furthermore, the difference in quality does not appear to increase or decrease with the number of customers.

Second, we studied the impact of the number of vehicles, and thus the delivery capacity available, on

the performance of the policies. The numbers of vehicles was chosen in such a way that we could study the

effectiveness of the policies when the available delivery capacity is smaller than the total expected demand,

as well as when there is surplus delivery capacity. The results are given in Table 5. Intuitively, it is clear

that when the delivery capacity is very restrictive, i.e., the number of vehicles is small, then it becomes

more important to use the available capacity wisely. The results show the superiority of the KNS policy

in handling situations with tight delivery capacity—the differences in quality are much larger for tightly

constrained instances than for loosely constrained instances.

Third, we studied the impact of the customer demand coefficient of variation on the performance of

37

Tab

le4:

Com

pari

son

ofth

eva

lues

ofpo

licie

sfo

rin

stan

ces

wit

hdi

ffere

ntnu

mbe

rsof

cust

omer

s.

CB

WM

yopi

cK

NS

Inst

ance

Nµ

σµ−

2σµ

+2σ

µσ

µ−

2σµ

+2σ

µσ

µ−

2σµ

+2σ

tcst

110

-12.

450.

37-1

3.20

-11.

71-1

1.39

0.26

-11.

91-1

0.86

-8.6

00.

27-9

.13

-8.0

7-1

2.21

0.27

-12.

74-1

1.67

-11.

250.

20-1

1.64

-10.

86-8

.73

0.29

-9.3

2-8

.14

-11.

970.

28-1

2.54

-11.

40-1

1.88

0.34

-12.

56-1

1.21

-8.5

30.

11-8

.75

-8.3

1-1

2.19

0.40

-12.

98-1

1.39

-11.

650.

24-1

2.13

-11.

18-8

.63

0.20

-9.0

4-8

.22

-13.

080.

24-1

3.57

-12.

60-1

1.73

0.18

-12.

09-1

1.38

-8.9

20.

27-9

.46

-8.3

7tc

st2

15-1

7.62

0.42

-18.

47-1

6.78

-17.

170.

24-1

7.64

-16.

70-1

3.10

0.13

-13.

35-1

2.85

-17.

760.

28-1

8.32

-17.

20-1

7.09

0.28

-17.

66-1

6.53

-13.

570.

10-1

3.77

-13.

38-1

8.25

0.42

-19.

08-1

7.41

-17.

300.

25-1

7.80

-16.

79-1

3.34

0.21

-13.

77-1

2.92

-17.

370.

39-1

8.16

-16.

58-1

7.13

0.17

-17.

48-1

6.79

-13.

630.

31-1

4.24

-13.

02-1

8.17

0.33

-18.

83-1

7.52

-16.

920.

15-1

7.21

-16.

62-1

3.45

0.16

-13.

78-1

3.13

tcst

320

-20.

580.

36-2

1.30

-19.

86-1

9.84

0.35

-20.

54-1

9.13

-16.

680.

28-1

7.24

-16.

12-2

0.81

0.29

-21.

38-2

0.24

-19.

350.

37-2

0.10

-18.

60-1

6.85

0.27

-17.

39-1

6.30

-20.

490.

34-2

1.18

-19.

81-1

9.21

0.28

-19.

77-1

8.66

-16.

430.

18-1

6.79

-16.

07-2

1.25

0.33

-21.

91-2

0.58

-19.

280.

35-1

9.97

-18.

58-1

6.59

0.30

-17.

18-1

5.99

-20.

360.

26-2

0.89

-19.

84-1

9.87

0.42

-20.

72-1

9.02

-16.

210.

27-1

6.75

-15.

66

38

Tab

le5:

Com

pari

son

ofth

eva

lues

ofpo

licie

sfo

rin

stan

ces

wit

hdi

ffere

ntnu

mbe

rsof

vehi

cles

.

CB

WM

yopic

KN

SIn

stance

Mµ

σµ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tveh

13

-65.4

40.1

7-6

5.7

8-6

5.1

0-6

4.1

10.1

8-6

4.4

8-6

3.7

5-5

8.5

80.1

9-5

8.9

6-5

8.2

0-6

5.8

50.2

5-6

6.3

4-6

5.3

5-6

3.7

30.2

5-6

4.2

3-6

3.2

3-5

9.2

40.2

9-5

9.8

2-5

8.6

5-6

5.8

50.2

0-6

6.2

4-6

5.4

5-6

3.8

20.2

5-6

4.3

1-6

3.3

3-5

9.0

50.2

3-5

9.5

2-5

8.5

8-6

6.0

30.1

9-6

6.4

1-6

5.6

4-6

3.8

40.2

2-6

4.2

9-6

3.4

0-5

8.9

20.2

1-5

9.3

5-5

8.5

0-6

5.7

20.3

2-6

6.3

6-6

5.0

7-6

3.9

30.2

7-6

4.4

7-6

3.4

0-5

8.7

30.1

8-5

9.0

9-5

8.3

6

tveh

26

1.4

10.1

31.1

61.6

62.0

00.2

61.4

82.5

14.8

30.2

24.3

95.2

71.1

70.2

40.7

01.6

52.1

70.1

81.8

12.5

25.3

00.1

74.9

65.6

41.4

30.1

81.0

81.7

81.5

80.2

71.0

42.1

25.4

30.2

44.9

55.9

11.3

00.1

60.9

91.6

21.9

60.3

61.2

42.6

85.1

40.2

64.6

15.6

70.8

20.2

00.4

21.2

22.1

80.2

91.6

02.7

55.2

80.2

44.7

95.7

6

tveh

39

15.0

10.3

214.3

715.6

516.1

00.2

115.6

916.5

218.3

40.1

817.9

718.7

115.2

80.1

914.9

015.6

615.9

30.1

815.5

616.2

918.0

60.2

417.5

818.5

315.1

50.1

214.9

115.3

915.9

80.1

915.5

916.3

617.6

40.1

417.3

617.9

115.3

00.2

414.8

315.7

816.0

90.2

215.6

516.5

318.1

70.3

317.5

218.8

314.8

70.1

914.4

815.2

616.2

30.2

915.6

416.8

217.8

40.2

417.3

518.3

3

39

the policies. The customer demand distributions for the three instances were selected so that the demand

distribution is the same for all customers in an instance, and the expected customer demand for each of the

instances is 5. We varied the distributions so that the customer demands have different variances, namely 1,

4 and 16. All other characteristics are exactly the same for the instances. The results are given in Table 6.

The results show that when the coefficients of variation of customer demand are large and it becomes less

clear what the future is going to bring, then the difference in quality between the KNS policy and the other

policies tend to be smaller, although the KNS policy still does better on every instance. As expected, this

indicates that carefully taking into account the available information about the future, such as through

dynamic programming approximation methods, provides more benefit if more information is available about

the future.

Next, we compare the performance of the three policies on an instance derived from real-world data.

The data for this was obtained from one of the smaller plants of a leading producer and distributor of air

products. Before describing the results, we indicate some features of the data which are interesting and

present in most data sets obtained from this company. We also indicate some of the changes that were made

to the data to make them consistent with input requirements of our algorithm.

1. Tank sizes at the customers range from 90,000 cubic feet to 700,000 cubic feet. The tank sizes at the

customers were rounded to the nearest multiple of 25,000, and product quantities were discretized in

multiples of 25,000.

2. The company did not have estimates of the probability distributions of demands at the customers.

However, they did have estimates of the mean and standard deviation of the demand. Using the mean

and standard deviation, we created a discrete demand distribution for each customer with the given

mean and standard deviation.

3. The company did not provide exact values for the revenue earned per unit of product delivered. We

used the same value for the revenue per unit of product at all the customers, assuming that the company

charged the same price to all its customers.

The performance of the three policies is shown in Table 7. As before, the performance of policy KNS is

much better than the Myopic policy, which in turn is better than the CBW policy. Overall, the computational

experiments conducted demonstrate the viability of using dynamic programming approximation methods for

the IRP.

5.3.2 Up to Three Customers Per Route

Tables 8, 9, 10, and 11 present the differences in the objective values of the KNS policy between a problem

that allows at most two customers per route and a problem that allows up to three customers per route. One

40

Tab

le6:

Com

pari

son

ofth

eva

lues

ofpo

licie

sfo

rin

stan

ces

wit

hdi

ffere

ntde

man

dva

rian

ces.

CB

WM

yopic

KN

SIn

stance

CV

µσ

µ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tvar1

0.1

-17.2

10.2

8-1

7.7

6-1

6.6

5-1

6.6

90.2

8-1

7.2

4-1

6.1

4-1

4.0

20.2

4-1

4.5

0-1

3.5

5-1

7.8

10.1

6-1

8.1

4-1

7.4

8-1

6.7

10.2

7-1

7.2

5-1

6.1

6-1

3.9

30.2

5-1

4.4

2-1

3.4

4-1

7.5

90.2

2-1

8.0

2-1

7.1

5-1

6.7

90.1

8-1

7.1

4-1

6.4

3-1

3.5

00.1

4-1

3.7

7-1

3.2

3-1

7.2

40.2

6-1

7.7

6-1

6.7

2-1

6.2

00.1

7-1

6.5

5-1

5.8

6-1

3.8

80.3

0-1

4.4

8-1

3.2

9-1

7.3

80.3

3-1

8.0

4-1

6.7

2-1

6.4

10.1

5-1

6.7

1-1

6.1

1-1

3.5

20.2

8-1

4.0

9-1

2.9

6

tvar2

0.4

-14.9

40.2

6-1

5.4

6-1

4.4

2-1

4.1

40.2

2-1

4.5

9-1

3.6

9-1

2.2

70.2

2-1

2.7

1-1

1.8

3-1

5.1

50.2

5-1

5.6

6-1

4.6

4-1

4.2

10.2

5-1

4.7

0-1

3.7

1-1

2.1

00.2

7-1

2.6

4-1

1.5

6-1

4.7

70.2

7-1

5.3

1-1

4.2

2-1

3.6

00.1

5-1

3.9

1-1

3.2

9-1

1.6

50.2

1-1

2.0

8-1

1.2

2-1

4.5

80.1

3-1

4.8

4-1

4.3

3-1

4.0

40.2

9-1

4.6

2-1

3.4

6-1

2.2

30.1

7-1

2.5

8-1

1.8

8-1

4.7

70.2

5-1

5.2

8-1

4.2

6-1

4.0

90.2

3-1

4.5

5-1

3.6

2-1

1.7

30.2

4-1

2.2

1-1

1.2

4

tvar3

0.8

-9.5

50.1

7-9

.89

-9.2

1-8

.17

0.1

8-8

.54

-7.8

0-6

.93

0.2

9-7

.52

-6.3

4-9

.59

0.2

0-1

0.0

0-9

.19

-8.0

30.1

8-8

.38

-7.6

7-6

.76

0.1

3-7

.03

-6.5

0-9

.85

0.2

8-1

0.4

2-9

.28

-8.1

80.2

4-8

.65

-7.7

0-7

.04

0.2

3-7

.50

-6.5

8-9

.74

0.2

9-1

0.3

2-9

.16

-8.0

40.2

1-8

.46

-7.6

2-7

.06

0.2

5-7

.56

-6.5

6-8

.90

0.0

9-9

.08

-8.7

2-8

.15

0.1

7-8

.49

-7.8

1-6

.89

0.2

4-7

.37

-6.4

1

Tab

le7:

Com

pari

son

ofth

eva

lues

ofpo

licie

sfo

ran

inst

ance

from

the

mot

ivat

ing

appl

icat

ion.

CB

WM

yopic

KN

SIn

stance

µσ

µ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tprx

132.6

21.2

730.0

735.1

736.5

40.4

835.5

737.5

145.4

52.7

040.0

450.8

634.9

61.2

832.4

037.5

339.4

11.5

136.3

842.4

447.1

51.9

643.2

351.0

734.1

61.8

430.4

837.8

437.5

51.6

934.1

740.9

347.0

61.6

243.8

250.3

034.7

51.2

332.3

037.2

139.8

81.1

937.5

042.2

644.2

62.0

740.1

348.3

933.9

31.3

131.3

036.5

637.1

41.2

234.7

139.5

742.9

71.2

640.4

445.5

0

41

would expect the policy to obtain a better objective value on a problem that allows up to three customers

per route than on a problem that allows at most two customers per route, for two reasons: the feasible sets of

the problem that allows up to three customers per route contains the feasible sets of the problem that allows

at most two customers per route; and the value function approximation used when up to three customers

per route are allowed is based on subsets with up to three customers, as opposed to the value function

approximation that is used when at most two customers per route are allowed, which is based on subsets

with at most two customers only. However, even though the results do show improvements in objective

value, they are relatively minor. (In the petrochemical and air products industry the number of customers

per route typically is less than or equal to three; routes with more than three customers do happen, but

infrequently. For example, in the application that motivated this work, approximately 95% of routes visit

three or fewer customers.)

5.4 Computation Times

The computational experiments discussed above have demonstrated the quality of the policies produced by

the dynamic programming approximation method. Next, we focus on its computational requirements, i.e.,

the effort needed to construct a policy and the effort involved in executing a policy. All computational

experiments were performed on an Intel Pentium III processor running at 1.4GHz. All times are reported in

seconds.

Recall that to approximate the optimal value function V ∗, the IRP is decomposed into subproblems,

and then, for any given state, the optimal value functions of the subproblems are combined by solving

a cardinality constrained set partitioning problem. The most computationally intensive task during the

construction of a policy is the solution of the individual subproblems, as each of these subproblems itself is

a Markov decision problem. The most computationally intensive task during the execution of a policy is the

calculation of the approximate value function at each of a number of possible future states, as each involves

the solution of a cardinality constrained set partitioning problem.

Table 12 presents the average times required to solve subproblems consisting of subsets of one, two,

and three customers, respectively. Based on the times reported in Table 12, one can estimate the total time

required to construct a policy. For a 50 customer problem, it takes about 0.5 seconds to solve all one customer

subproblems, 318.5 seconds to solve all two customer subproblems, and 668,360 seconds (or 7.73 days) to

solve all three customer subproblems. Observe, though, that these estimates are based on simple-minded

straightforward implementations. In practice, it typically is not necessary to solve all two or three customer

subproblems. Simple rules may reduce the number of subproblems to be solved significantly and thereby

the computation times; for example, it makes sense to eliminate all subsets of two and three customers that

cannot be visited by a single vehicle in one day. Regardless, the policy has to be constructed only once, and

42

Tab

le8:

Com

pari

son

ofth

eva

lues

ofth

eK

NS

polic

yfo

rpr

oble

ms

allo

win

gat

mos

ttw

ocu

stom

ers

per

rout

eve

rsus

prob

lem

sal

low

ing

upto

thre

ecu

stom

ers

per

rout

e,fo

rin

stan

ces

wit

hdi

ffere

ntnu

mbe

rsof

cust

omer

s.

KN

S(2

-sto

ps)

KN

S(3

-sto

ps)

Inst

ance

Nµ

σµ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tcst

110

-8.6

00.2

7-9

.13

-8.0

7-7

.18

0.4

0-7

.97

-6.3

8-8

.73

0.2

9-9

.32

-8.1

4-7

.31

0.3

3-7

.97

-6.6

4-8

.53

0.1

1-8

.75

-8.3

1-7

.42

0.4

4-8

.29

-6.5

5-8

.63

0.2

0-9

.04

-8.2

2-7

.27

0.5

5-8

.37

-6.1

8-8

.92

0.2

7-9

.46

-8.3

7-7

.18

0.5

6-8

.30

-6.0

7

tcst

215

-13.1

00.1

3-1

3.3

5-1

2.8

5-1

1.1

00.5

5-1

2.2

1-1

0.0

0-1

3.5

70.1

0-1

3.7

7-1

3.3

8-1

1.8

10.5

6-1

2.9

2-1

0.6

9-1

3.3

40.2

1-1

3.7

7-1

2.9

2-1

1.3

00.5

5-1

2.3

9-1

0.2

1-1

3.6

30.3

1-1

4.2

4-1

3.0

2-1

1.6

30.7

7-1

3.1

8-1

0.0

8-1

3.4

50.1

6-1

3.7

8-1

3.1

3-1

1.5

80.5

8-1

2.7

4-1

0.4

3

tcst

320

-16.6

80.2

8-1

7.2

4-1

6.1

2-1

4.2

30.5

8-1

5.3

8-1

3.0

8-1

6.8

50.2

7-1

7.3

9-1

6.3

0-1

4.2

60.6

7-1

5.6

0-1

2.9

3-1

6.4

30.1

8-1

6.7

9-1

6.0

7-1

4.8

70.6

3-1

6.1

3-1

3.6

0-1

6.5

90.3

0-1

7.1

8-1

5.9

9-1

4.4

00.6

7-1

5.7

4-1

3.0

6-1

6.2

10.2

7-1

6.7

5-1

5.6

6-1

3.6

00.5

1-1

4.6

2-1

2.5

8

43

Tab

le9:

Com

pari

son

ofth

eva

lues

ofth

eK

NS

polic

yfo

rpr

oble

ms

allo

win

gat

mos

ttw

ocu

stom

ers

per

rout

eve

rsus

prob

lem

sal

low

ing

upto

thre

ecu

stom

ers

per

rout

e,fo

rin

stan

ces

wit

hdi

ffere

ntnu

mbe

rsof

vehi

cles

.

KN

S(2

-sto

p)

KN

S(3

-sto

p)

Inst

ance

Mµ

σµ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tveh

13

-58.5

80.1

9-5

8.9

6-5

8.2

0-5

6.3

00.6

2-5

7.5

3-5

5.0

7-5

9.2

40.2

9-5

9.8

2-5

8.6

5-5

6.3

70.6

5-5

7.6

8-5

5.0

6-5

9.0

50.2

3-5

9.5

2-5

8.5

8-5

6.6

10.7

4-5

8.1

0-5

5.1

2-5

8.9

20.2

1-5

9.3

5-5

8.5

0-5

6.5

80.6

7-5

7.9

3-5

5.2

4-5

8.7

30.1

8-5

9.0

9-5

8.3

6-5

6.2

60.7

7-5

7.7

9-5

4.7

2

tveh

26

4.8

30.2

24.3

95.2

75.9

10.1

95.5

36.2

85.3

00.1

74.9

65.6

46.4

70.3

65.7

57.1

85.4

30.2

44.9

55.9

16.7

60.2

96.1

77.3

45.1

40.2

64.6

15.6

76.6

10.6

15.3

97.8

45.2

80.2

44.7

95.7

66.8

20.4

75.8

87.7

7

tveh

39

18.3

40.1

817.9

718.7

119.4

80.5

318.4

220.5

318.0

60.2

417.5

818.5

318.9

90.5

117.9

620.0

117.6

40.1

417.3

617.9

118.9

00.4

717.9

719.8

318.1

70.3

317.5

218.8

319.0

60.5

218.0

320.1

017.8

40.2

417.3

518.3

319.6

10.6

318.3

520.8

7

44

Tab

le10

:C

ompa

riso

nof

the

valu

esof

the

KN

Spo

licy

for

prob

lem

sal

low

ing

atm

ost

two

cust

omer

spe

rro

ute

vers

uspr

oble

ms

allo

win

gup

toth

ree

cust

omer

spe

rro

ute,

for

inst

ance

sw

ith

diffe

rent

dem

and

vari

ance

s.

KN

S(2

-sto

p)

KN

S(3

-sto

p)

Inst

ance

Mµ

σµ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tvar1

0.1

-14.0

20.2

4-1

4.5

0-1

3.5

5-1

1.9

10.3

4-1

2.6

0-1

1.2

2-1

3.9

30.2

5-1

4.4

2-1

3.4

4-1

1.6

30.4

1-1

2.4

5-1

0.8

1-1

3.5

00.1

4-1

3.7

7-1

3.2

3-1

1.3

80.5

4-1

2.4

7-1

0.3

0-1

3.8

80.3

0-1

4.4

8-1

3.2

9-1

1.3

70.4

6-1

2.2

9-1

0.4

5-1

3.5

20.2

8-1

4.0

9-1

2.9

6-1

1.2

60.4

4-1

2.1

5-1

0.3

8

tvar2

0.4

-12.2

70.2

2-1

2.7

1-1

1.8

3-1

1.2

30.2

0-1

1.6

3-1

0.8

3-1

2.1

00.2

7-1

2.6

4-1

1.5

6-1

1.4

80.3

7-1

2.2

3-1

0.7

4-1

1.6

50.2

1-1

2.0

8-1

1.2

2-1

1.0

70.3

8-1

1.8

3-1

0.3

1-1

2.2

30.1

7-1

2.5

8-1

1.8

8-1

0.8

70.1

7-1

1.2

2-1

0.5

3-1

1.7

30.2

4-1

2.2

1-1

1.2

4-1

0.9

50.4

8-1

1.9

1-9

.99

tvar3

0.8

-6.9

30.2

9-7

.52

-6.3

4-6

.35

0.2

7-6

.90

-5.8

1-6

.76

0.1

3-7

.03

-6.5

0-6

.32

0.1

6-6

.65

-6.0

0-7

.04

0.2

3-7

.50

-6.5

8-5

.86

0.1

7-6

.21

-5.5

2-7

.06

0.2

5-7

.56

-6.5

6-6

.22

0.2

6-6

.75

-5.6

9-6

.89

0.2

4-7

.37

-6.4

1-6

.18

0.3

4-6

.87

-5.4

9

Tab

le11

:C

ompa

riso

nof

the

valu

esof

the

KN

Spo

licy

for

anin

stan

cefr

omth

em

otiv

atin

gap

plic

atio

n,fo

rth

eca

sein

whi

chat

mos

ttw

ocu

stom

ers

per

rout

ear

eal

low

edve

rsus

the

case

inw

hich

upto

thre

ecu

stom

ers

per

rout

ear

eal

low

ed.

KN

S(2

-sto

p)

KN

S(3

-sto

p)

Inst

ance

µσ

µ−

2σ

µ+

2σ

µσ

µ−

2σ

µ+

2σ

tprx

145.4

52.7

040.0

450.8

649.7

12.2

645.1

954.2

447.1

51.9

643.2

351.0

750.0

42.1

645.7

354.3

547.0

61.6

243.8

250.3

049.8

81.1

547.5

952.1

744.2

62.0

740.1

348.3

948.6

21.8

744.8

852.3

642.9

71.2

640.4

445.5

047.5

01.7

444.0

150.9

9

45

Table 12: Average solution times in seconds for a set of subproblems with different numbers of customers.Subset size Time (secs.)

One customer 0.01Two customer 0.26

Three customer 34.10

it is acceptable to spend a substantial amount of computational effort to do that.

Once the approximating function V has been constructed, only the problem (3) has to be solved each

day, each time for a given value of the state x. Because the daily problem has to be solved many times, it

is important that this computational task can be performed with relatively little effort. Given the current

state, two types of decisions have to be made in the daily problem, namely which customers to visit on each

vehicle route, and how much to deliver at those customers. Table 13 presents the average time to determine

an action for a given state, both for the case in which subsets of at most two customers are used in the value

function approximation and the case in which subsets of up to three customers are used.

Table 13: Average solution times in seconds for the daily problems of the instances used in the computationalexperiments, for the case in which at most two customers per route are allowed and the case in which up tothree customers per route are allowed.

Instance size at most 2 size at most 3tcst1 22.1 107.9tcst2 52.4 355.5tcst3 134.1 1080.2tvar1 54.7 356.4tvar2 53.4 360.1tvar3 55 362.5tveh1 39.8 271.4tveh2 56.1 354.8tveh3 64.2 368.9tprx1 18.3 100.3

Recall that when the approximate value function combines subproblems based on subsets with at most

two customers, a maximum weight perfect matching problem is solved to compute the approximate value

of a given state, but when the approximate value function combines subproblems based on subsets of up to

three or more customers, a cardinality constrained set partitioning problem is solved. In our computational

experiments, the set partitioning problems were solved with CPLEX 7.0, a commercially available integer

programming solver. All set partitioning problems were solved to proven optimality using the default settings.

No attempts were made to speed up the solution process. In practice, it may not be necessary to solve the set

partitioning problems to proven optimality. It is well-known that in the solution of difficult integer programs

most of the time is spent on proving optimality, and not on finding the optimal solution. Also recall that

46

the purpose of solving the cardinality constrained set partitioning problem is just to obtain an approximate

value corresponding to a particular state, and not to obtain an optimal or even feasible solution for the

partitioning problem—the solution of the partitioning problem is not used at all. Thus, in an application,

it is reasonable to stop the solution of the partitioning problem as soon as sufficiently tight bounds on the

optimal value of the partitioning problem has been obtained.

The results in Table 13 show that the times required to execute a policy, that is, to determine the action

for a given state, are acceptable; about one minute for the case with subsets of at most two customers

and five minutes for the case with subsets of at most three customers. On the other hand, the results do

demonstrate the value of being able to solve a maximum weight perfect matching problem as opposed to a

cardinality constrained set partitioning problem in the two customer subset case.

6 Conclusion

In this paper we formulated a Markov decision process model of a stochastic inventory routing problem. The

Markov decision problem can be solved with conventional algorithms, but only very small problems can be

solved with reasonable computational resources. This motivated us to develop an approximation method.

An important part of the method was the construction of an approximation V to the optimal value function

V ∗. The approximation V was based on a decomposition of the overall problem into subproblems. Of

course, this is a very natural idea and is not new. However, the way in which the decomposition was done,

the subproblems were formulated, and the results of the subproblems were combined to construct V , seem

to be novel, and the results were promising. Subproblems were defined for specific subsets of customers,

and the subsets overlapped to quite a large extent, and covered the set of customers. The values V (x) of

the approximating function V were calculated by solving an optimization problem to choose a collection of

subsets of customers that partitions the set of customers. Effort was put into formulating each subproblem

so that the combined subproblems gives an accurate representation of the overall process. Then the process

is controlled by solving a single-stage problem for the current state at each stage of the process.

The approach described in this paper has promise of being applicable to many other stochastic control

problems besides the stochastic inventory routing problem. Many stochastic control problems are hard

because the Markov decision process formulation of the problem has a high dimensional state space with

a huge number of states. This often comes about because the problem addresses the coordinated control

of many interdependent resources; sometimes these resources have fairly similar characteristics. A natural

extension of the approach described in this paper would be to decompose the overall control problem into

subproblems involving subsets of resources. The approximate value function V is computed by combining

the results of the subproblems in an associated partitioning-type optimization problem. The combined

47

subproblems should give an accurate representation of the overall process, and the subproblems should be

tractable. The overall process can then be controlled by, at each stage of the process, solving a single-stage

problem for the current state.

We give a brief example of another application of the approach described above. Recently we worked on

a dynamic bin covering problem that was motivated by the following application. Pieces of fish move along

a conveyor belt. At the end of the line the pieces are weighed and then packed into one of several open

bins. The pieces have different weights that are unknown until the weights are measured. A fairly good

probability distribution of the weights can be estimated from historical data. Each bin is closed as soon as

the total weight of the fish in the bin exceeds the minimum weight specified for the bin. After each piece

of fish has been weighed, a decision has to be made regarding which open bin to place the piece in. The

objective is to fill (and close) as many bins over the long run as possible, or equivalently, to minimize the

average overweight per bin over the long run. It is easy to formulate a Markov decision process model of

the problem. If only a small number (say up to 3 or 4) of bins can be open at a time, then the problem

can be solved with reasonable computational resources. However, when many bins can be open at a time,

approximation methods are needed. (Some industrial packers can have 10 or even more bins open at a time.)

An approximation method along the lines of the approach described above was developed. Computational

results have been promising.

Acknowledgement

We thank Warren Powell for many constructive discussions.

References

S. Anily and A. Federgruen, “One Warehouse Multiple Retailer Systems with Vehicle Routing Costs,”Management Science 36, 92–114 (1990).

S. Anily and A. Federgruen, “Rejoinder to “Comments on One Warehouse Multiple Retailer Systems withVehicle Routing Costs”,” Management Science 37, 1497–1499 (1991).

S. Anily and A. Federgruen, “Two-Echelon Distribution Systems with Vehicle Routing Costs and CentralInventories,” Operations Research 41, 37–47 (1993).

J. F. Bard, L. Huang, P. Jaillet, and M. Dror, “A Decomposition Approach to the Inventory Routing Problemwith Satellite Facilities,” Transportation Science 32, 189–203 (1998).

D. Barnes-Schuster and Y. Bassok, “Direct Shipping and the Dynamic Single-depot/Multi-retailer InventorySystem,” European Journal of Operational Research 101, 509–518 (1997).

Y. Bassok and R. Ernst, “Dynamic Allocations for Multi-Product Distribution,” Transportation Science 29,256–266 (1995).

48

W. Bell, L. Dalberto, M. Fisher, A. Greenfield, R. Jaikumar, P. Kedia, R. Mack, and P. Prutzman, “Im-proving the Distribution of Industrial Gases with an On-Line Computerized Routing and SchedulingOptimizer,” Interfaces 13, 4–23 (1983).

R. Bellman and S. Dreyfus, “Functional Approximations and Dynamic Programming,” Mathematical Tablesand Other Aids to Computation 13, 247–251 (1959).

R. E. Bellman, R. Kalaba, and B. Kotkin, “Polynomial Approximation—A New Computational Techniquein Dynamic Programming: Allocation Processes,” Mathematics of Computation 17, 155–161 (1963).

L. Bertazzi, G. Paletta, and M. G. Speranza, “Deterministic Order-Up-To Level Policies in an InventoryRouting Problem,” Transportation Science 36, 119–132 (2002).

D. P. Bertsekas, “Convergence of Discretization Procedures in Dynamic Programming,” IEEE Transactionson Automatic Control AC-20, 415–419 (1975).

D. P. Bertsekas, Dynamic Programming and Optimal Control , Athena Scientific, Belmont, MA (1995).

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press,New York, NY (1978).

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming , Athena Scientific, New York, NY (1996).

J. Bramel and D. Simchi-Levi, “A Location Based Heuristic for General Routing Problems,” OperationsResearch 43, 649–660 (1995).

L. D. Burns, R. W. Hall, D. E. Blumenfeld, and C. F. Daganzo, “Distribution Strategies that MinimizeTransportation and Inventory Costs,” Operations Research 33, 469–490 (1985).

S. Cetinkaya and C. Y. Lee, “Stock Replenishment and Shipment Scheduling for Vendor Managed InventorySystems,” Management Science 46, 217–232 (2000).

L. M. A. Chan, A. Federgruen, and D. Simchi-Levi, “Probabilistic Analysis and Practical Algorithms forInventory-Routing Models,” Operations Research 46, 96–106 (1998).

C. S. Chang, “Discrete-Sample Curve Fitting Using Chebyshev Polynomials and the Approximate Deter-mination of Optimal Trajectories via Dynamic Programming,” IEEE Transactions on Automatic ControlAC-11, 116–118 (1966).

V. C. P. Chen, D. Ruppert, and C. A. Shoemaker, “Applying Experimental Design and Regression Splines toHigh-Dimensional Continuous-State Stochastic Dynamic Programming,” Operations Research 47, 38–53(1999).

T. W. Chien, A. Balakrishnan, and R. T. Wong, “An Integrated Inventory Allocation and Vehicle RoutingProblem,” Transportation Science 23, 67–76 (1989).

C. S. Chow and J. N. Tsitsiklis, “An Optimal One-Way Multigrid Algorithm for Discrete-Time StochasticControl,” IEEE Transactions on Automatic Control AC-36, 898–914 (1991).

M. Christiansen, “Decomposition of a Combined Inventory and Time Constrained Ship Routing Problem,”Transportation Science 33, 3–16 (1999).

M. Christiansen and B. Nygreen, “A Method for Solving Ship Routing Problems with Inventory Constraints,”Annals of Operations Research 81, 357–378 (1998a).

M. Christiansen and B. Nygreen, “Modelling Path Flows for a Combined Ship Routing and InventoryManagement Problem,” Annals of Operations Research 82, 391–412 (1998b).

49

D. C. Collins, “Reduction of Dimensionality in Dynamic Programming via the Method of Diagonal Decom-position,” Journal of Mathematical Analysis and Applications 31, 223–234 (1970).

D. C. Collins and E. S. Angel, “The Diagonal Decomposition Technique Applied to the Dynamic Program-ming Solution of Elliptic Partial Differential Equations,” Journal of Mathematical Analysis and Applica-tions 33, 467–481 (1971).

D. C. Collins and A. Lew, “A Dimensional Approximation in Dynamic Programming by Structural Decom-position,” Journal of Mathematical Analysis and Applications 30, 375–384 (1970).

W. Cook and A. Rohe, “Computing Minimum-Weight Perfect Matchings,” (1998), preprint.

P. J. Courtois, Decomposability: Queueing and Computer System Applications, Academic Press, New York,NY (1977).

P. J. Courtois and P. Semal, “Error Bounds for the Analysis by Decomposition of Non-Negative Matrices,”in Mathematical Computer Performance and Reliability , G. Iazeolla, P. J. Courtois, and A. Hordijk (eds),chapter 2.2, 209–224, Elsevier Science Publishers B.V., Amsterdam, Netherlands (1984).

J. W. Daniel, “Splines and Efficiency in Dynamic Programming,” Journal of Mathematical Analysis andApplications 54, 402–407 (1976).

D. P. De Farias and B. Van Roy, “On the Existence of Fixed Points for Approximate Value Iteration andTemporal-Difference Learning,” Journal of Optimization Theory and Applications 105, 589–608 (2000).

M. Dror and M. Ball, “Inventory/Routing: Reduction from an Annual to a Short Period Problem,” NavalResearch Logistics Quarterly 34, 891–905 (1987).

M. Dror, M. Ball, and B. Golden, “A Computational Comparison of Algorithms for the Inventory RoutingProblem,” Annals of Operations Research 4, 3–23 (1985).

M. Dror and L. Levy, “Vehicle Routing Improvement Algorithms: Comparison of a “Greedy” and a MatchingImplementation for Inventory Routing,” Computers and Operations Research 13, 33–45 (1986).

J. Edmonds, “Maximum Matching and a Polyhedron with 0,1-Vertices,” Journal of Research of the NationalBureau of Standards 69B, 125–130 (1965a).

J. Edmonds, “Paths, Trees and Flowers,” Canadian Journal of Mathematics 17, 449–467 (1965b).

A. Federgruen and P. Zipkin, “A Combined Vehicle Routing and Inventory Allocation Problem,” OperationsResearch 32, 1019–1037 (1984).

B. L. Fox, “Discretizing Dynamic Programs,” Journal of Optimization Theory and Applications 11, 228–234(1973).

H. N. Gabow, “Data Structures for Weighted Matching and Nearest Common Ancestors with Linking,” inProceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 434–443, New York, NY,1990.

G. Gallego and D. Simchi-Levi, “On the Effectiveness of Direct Shipping Strategy for the One-WarehouseMulti-Retailer R-Systems,” Management Science 36, 240–243 (1990).

V. Gaur and M. L. Fisher, “An Optimization Algorithm for the Joint Vehicle Routing and Inventory ControlProblem and Its Implementation at a Large Supermarket Chain,” (2002), preprint.

B. Golden, A. Assad, and R. Dahl, “Analysis of a Large Scale Vehicle Routing Problem with an InventoryComponent,” Large Scale Systems 7, 181–190 (1984).

50

A. Haurie and P. L’Ecuyer, “Approximation and Bounds in Discrete Event Dynamic Programming,” IEEETransactions on Automatic Control AC-31, 227–235 (1986).

Y. Herer and R. Roundy, “Heuristics for a One-Warehouse Multiretailer Distribution Problem with Perfor-mance Bounds,” Operations Research 45, 102–115 (1997).

K. Hinderer, “Estimates for Finite-Stage Dynamic Programs,” Journal of Mathematical Analysis and Appli-cations 55, 207–238 (1976).

K. Hinderer, “On Approximate Solutions of Finite-Stage Dynamic Programs,” in Dynamic Programmmingand its Applications, M. L. Puterman (ed), 289–317, Academic Press, New York, NY (1978).

K. Hinderer and G. Hubner, “On Exact and Approximate Solutions of Unstructured Finite-Stage DynamicPrograms,” in Markov Decision Theory : Proceedings of the Advanced Seminar on Markov Decision Theoryheld at Amsterdam, The Netherlands, September 13–17, 1976 , H. C. Tijms and J. Wessels (eds), 57–76,Mathematisch Centrum, Amsterdam, The Netherlands (1977).

A. J. Kleywegt, V. S. Nori, and M. W. P. Savelsbergh, “The Stochastic Inventory Routing Problem withDirect Deliveries,” Transportation Science 36, 94–118 (2002).

H. J. Kushner, “Numerical Methods for Continuous Control Problems in Continuous Time,” SIAM Journalon Control and Optimization 28, 999–1048 (1990).

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,Springer-Verlag, New York, NY (1992).

R. Larson, “Transporting Sludge to the 106 Mile Site: An Inventory/ Routing Model for Fleet Sizing andLogistics System Design,” Transportation Science 22, 186–198 (1988).

S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability , Springer-Verlag, London, GreatBritain (1993).

A. S. Minkoff, “A Markov Decision Model and Decomposition Heuristic for Dynamic Vehicle Dispatching,”Operations Research 41, 77–90 (1993).

T. Morin, “Computational Advances in Dynamic Programming,” in Dynamic Programmming and its Appli-cations, M. L. Puterman (ed), 53–90, Academic Press, New York, NY (1978).

B. L. Nelson and F. J. Matejcik, “Using Common Random Numbers for Indifference-zone Selection andMultiple Comparisons in Simulation,” Management Science 41, 1935–1945 (1995).

W. B. Powell and T. A. Carvalho, “Dynamic Control of Logistics Queueing Networks for Large-Scale FleetManagement,” Transportation Science 32, 90–109 (1998).

M. L. Puterman, Markov Decision Processes, John Wiley & Sons, Inc., New York, NY (1994).

M. I. Reiman, R. Rubio, and L. M. Wein, “Heavy Traffic Analysis of the Dynamic Stochastic Inventory-Routing Problem,” Transportation Science 33, 361–380 (1999).

D. F. Rogers, R. D. Plante, R. T. Wong, and J. R. Evans, “Aggregation and Disaggregation Techniques andMethodology in Optimization,” Operations Research 39, 553–582 (1991).

P. J. Schweitzer and A. Seidman, “Generalized Polynomial Approximations in Markovian DecisionProcesses,” Journal of Mathematical Analysis and Applications 110, 568–582 (1985).

N. Secomandi, “Comparing Neuro-Dynamic Programming Algorithms for the Vehicle Routing Problem withStochastic Demands,” Computers and Operations Research 27, 1201–1225 (2000).

51

G. W. Stewart, “On the Structure of Nearly Uncoupled Markov Chains,” in Mathematical Computer Per-formance and Reliability , G. Iazeolla, P. J. Courtois, and A. Hordijk (eds), chapter 2.7, 287–302, ElsevierScience Publishers B.V., Amsterdam, Netherlands (1984).

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA (1998).

D. M. Topkis, “Optimal Ordering and Rationing Policies in a Nonstationary Dynamic Inventory Model withn Demand Classes,” Management Science 15, 160–176 (1968).

P. Trudeau and M. Dror, “Stochastic Inventory Routing: Route Design with Stockouts and Route Failures,”Transportation Science 26, 171–184 (1992).

J. N. Tsitsiklis and B. Van Roy, “Feature-Based Methods for Large-Scale Dynamic Programming,” MachineLearning 22, 59–94 (1996).

J. N. Tsitsiklis and B. Van Roy, “Average Cost Temporal-Difference Learning,” Automatica 35, 1799–1808(1999a).

J. N. Tsitsiklis and B. Van Roy, “Optimal Stopping of Markov Processes: Hilbert Space Theory, Approxi-mation Algorithms, and an Application to Pricing High-Dimensional Derivatives,” IEEE Transactions onAutomatic Control 44, 1840–1851 (1999b).

B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis, “A Neuro-Dynamic Programming Approach toRetailer Inventory Management,” in Proceedings of the IEEE Conference on Decision and Control , IEEE,1997.

B. Van Roy and J. N. Tsitsiklis, “Stable Linear Approximations to Dynamic Programming for StochasticControl Problems with Local Transitions,” in Advances in Neural Information Processing Systems 8 ,1045–1051, MIT Press, Cambridge, MA (1996).

S. Viswanathan and K. Mathur, “Integrating Routing and Inventory Decisions in One-Warehouse Multire-tailer Multiproduct Distribution Systems,” Management Science 43, 294–312 (1997).

R. Webb and R. Larson, “Period and Phase of Customer Replenishment: A New Approach to the StrategicInventory/Routing Problem,” European Journal of Operational Research 85, 132–148 (1995).

W. Whitt, “Approximations of Dynamic Programs, I,” Mathematics of Operations Research 3, 231–243(1978).

W. Whitt, “A-Priori Bounds for Approximations of Markov Programs,” Journal of Mathematical Analysisand Applications 71, 297–302 (1979a).

W. Whitt, “Approximations of Dynamic Programs, II,” Mathematics of Operations Research 4, 179–185(1979b).

P. J. Wong, “An Approach to Reducing the Computing Time for Dynamic Programming,” OperationsResearch 18, 181–185 (1970a).

P. J. Wong, “A New Decomposition Procedure for Dynamic Programming,” Operations Research 18, 119–131(1970b).

52

Appendix

Instances Used in Computational Results

Table 14: Instance topt1.i xi yi Ci fi ri pi hi

0 1 21 0.0 10.0 2 0.0 0.5 0.5 100 40 12 -10.0 0.0 2 0.0 0.7 0.3 100 40 13 0.0 -10.0 2 0.0 0.3 0.7 100 40 14 10.0 0.0 2 0.0 0.2 0.8 100 40 1

Vendor (0, 0), N = 4, M = 1, CV = 4


0 1 2 3 41 0.0 10.0 4 0.0 0.2 0.2 0.4 0.2 100 40 12 -10.0 0.0 4 0.0 0.1 0.5 0.2 0.2 100 40 13 0.0 -10.0 4 0.0 0.3 0.3 0.3 0.3 100 40 14 10.0 0.0 4 0.0 0.2 0.3 0.5 0.0 100 40 1

Vendor (0, 0), N = 4, M = 1, CV = 5


0 1 2 3 4 5 61 0.0 10.0 6 0.0 0.2 0.2 0.1 0.2 0.2 0.1 100 40 12 -10.0 0.0 6 0.0 0.1 0.2 0.2 0.2 0.2 0.1 100 40 13 0.0 -10.0 6 0.0 0.0 0.0 0.5 0.5 0.0 0.0 100 40 14 10.0 0.0 6 0.0 0.0 0.3 0.0 0.6 0.0 0.1 100 40 1

Vendor (0, 0), N = 4, M = 1, CV = 5

53


0 1 2 3 4 5 6 7 81 0.0 10.0 8 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 100 40 12 -10.0 0.0 8 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 100 40 13 0.0 -10.0 8 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 100 40 14 10.0 0.0 8 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 100 40 1

Vendor (0, 0), N = 4, M = 1, CV = 8

Table 18: Instances tcst1, tcst2 and tcst3. The values of (N,M) are (10, 4), (15, 6) and (20, 8).i xi yi Ci fi ri pi hi

0 1 2 3 4 5 6 7 8 9 101 16.2 -22.2 10 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 598 310 12 -23.2 -18.7 10 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 504 294 23 9.1 9.8 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.8 0.0 0.0 571 307 14 19.5 -9.5 10 0.0 0.0 0.0 0.3 0.4 0.3 0.0 0.0 0.0 0.0 0.0 569 304 25 -20.0 23.5 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 581 262 16 -4.9 -22.1 10 0.0 0.4 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 551 347 27 -0.8 -14.0 10 0.0 0.0 0.0 0.0 0.0 0.4 0.6 0.0 0.0 0.0 0.0 585 266 18 4.3 14.8 10 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 518 257 29 -6.9 -4.2 10 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.3 0.4 0.1 0.0 571 305 1

10 21.9 -22.2 10 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 557 281 211 -17.8 29.7 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.2 550 315 112 7.4 11.2 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.2 0.4 551 259 213 9.1 -0.4 10 0.0 0.0 0.0 0.4 0.5 0.1 0.0 0.0 0.0 0.0 0.0 581 346 114 -0.4 23.7 10 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 518 340 215 14.7 22.0 10 0.0 0.0 0.0 0.0 0.1 0.4 0.3 0.2 0.0 0.0 0.0 575 264 116 29.8 12.2 10 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.4 0.0 0.0 0.0 511 327 217 -16.4 -26.9 10 0.0 0.1 0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 521 282 118 -5.5 -25.0 10 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 523 287 219 -8.7 -27.1 10 0.0 0.0 0.0 0.0 0.4 0.1 0.1 0.4 0.0 0.0 0.0 562 271 120 25.3 17.5 10 0.0 0.0 0.0 0.3 0.3 0.4 0.0 0.0 0.0 0.0 0.0 598 335 2

Vendor (0, 0), CV = 9

54

Table 19: Instance tvar1.i xi yi Ci fi ri pi hi

0 1 2 3 4 5 6 7 8 9 101 -11.4 -11.8 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 541 315 22 8.0 5.2 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 515 238 13 18.7 -28.3 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 587 328 24 14.6 -19.2 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 415 211 15 3.5 11.0 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 507 237 26 10.4 18.2 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 485 279 17 -6.1 4.4 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 442 397 28 12.1 21.6 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 515 287 19 13.9 6.8 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 598 305 2

10 -14.0 -12.6 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 586 389 111 21.8 -4.6 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 492 295 212 9.6 -5.5 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 448 270 113 -12.3 -4.5 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 510 330 214 11.8 12.2 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 476 244 115 6.5 8.0 10 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 432 212 2

Vendor (0, 0), N = 15, M = 5, CV = 12


0 1 2 3 4 5 6 7 8 9 101 -11.4 -11.8 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 541 315 22 8.0 5.2 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 515 238 13 18.7 -28.3 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 587 328 24 14.6 -19.2 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 415 211 15 3.5 11.0 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 507 237 26 10.4 18.2 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 485 279 17 -6.1 4.4 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 442 397 28 12.1 21.6 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 515 287 19 13.9 6.8 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 598 305 2

10 -14.0 -12.6 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 586 389 111 21.8 -4.6 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 492 295 212 9.6 -5.5 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 448 270 113 -12.3 -4.5 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 510 330 214 11.8 12.2 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 476 244 115 6.5 8.0 10 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 432 212 2

Vendor (0, 0), N = 15, M = 5, CV = 12


0 1 2 3 4 5 6 7 8 9 101 -11.4 -11.8 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 541 315 22 8.0 5.2 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 515 238 13 18.7 -28.3 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 587 328 24 14.6 -19.2 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 415 211 15 3.5 11.0 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 507 237 26 10.4 18.2 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 485 279 17 -6.1 4.4 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 442 397 28 12.1 21.6 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 515 287 19 13.9 6.8 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 598 305 2

10 -14.0 -12.6 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 586 389 111 21.8 -4.6 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 492 295 212 9.6 -5.5 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 448 270 113 -12.3 -4.5 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 510 330 214 11.8 12.2 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 476 244 115 6.5 8.0 10 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 432 212 2

Vendor (0, 0), N = 15, M = 5, CV = 12

55

Table 22: Instances tveh1, tveh2 and tveh3.i xi yi Ci fi ri pi hi

0 1 2 3 4 5 6 7 8 9 101 24.8 13.8 10 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 599 256 02 -3.3 18.8 10 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 502 328 03 -24.6 -14.6 10 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 644 268 04 25.2 5.9 10 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 533 347 05 4.3 26.7 10 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 467 255 06 24.9 -1.4 10 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 479 324 07 -29.3 20.6 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 588 260 08 24.3 -6.6 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 629 340 09 5.7 -11.8 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 647 301 0

10 5.9 -2.4 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 639 303 011 4.5 -1.1 10 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 480 324 012 22.0 -1.9 10 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 593 266 013 -3.8 -28.3 10 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 497 278 014 -22.6 -9.7 10 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 647 327 015 28.5 26.0 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 562 284 0

Vendor (0, 0), N = 15, M = 3, CV = 12

Table 23: Instance tprx1.i Long Lat Ci fi ri pi hi

0 1 2 3 4 5 6 7 8 9 101 -86.8 33.6 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 550 250 02 -85.3 35.0 4 0 0.5 0.0 0.5 0.0 550 250 03 -81.0 35.2 8 0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 550 210 04 -96.4 32.5 24 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 550 260 05 -95.4 29.8 28 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 550 260 06 -85.8 38.2 4 0 0.0 0.0 1.0 0.0 550 210 07 -90.0 35.2 11 0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 550 210 08 -90.1 30.0 4 0 1.0 0.0 0.0 0.0 550 190 09 -98.1 29.3 18 0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 550 260 0

Vendor (−84.2, 33.8), N = 9, M = 4, CV = 20

56

Documents

Dynamic Programming Approximations for a Stochastic ...ms79/publications/ts38-1.pdf · Dynamic Programming Approximations for a Stochastic Inventory Routing Problem Anton J. Kleywegt