The Stochastic Inventory Routing Problem with Direct Deliveries

The Stochastic Inventory Routing Problem with Direct Deliveries

Anton J. Kleywegt ∗

Vijay S. NoriMartin W. P. Savelsbergh

School of Industrial and Systems EngineeringGeorgia Institute of Technology

Atlanta, GA 30332-0205

November 20, 2000

Abstract

Vendor managed inventory replenishment is a business practice in which vendors monitor their cus-

tomers’ inventories, and decide when and how much inventory should be replenished. The inventory

routing problem addresses the coordination of inventory management and transportation. The ability to

solve the inventory routing problem contributes to the realization of the potential savings in inventory

and transportation costs brought about by vendor managed inventory replenishment. The inventory

routing problem is hard, especially if a large number of customers is involved. We formulate the inven-

tory routing problem as a Markov decision process, and we propose approximation methods to find good

solutions with reasonable computational effort. Computational results are presented for the inventory

routing problem with direct deliveries.

∗Supported by the National Science Foundation under grant DMI-9875400.

The inventory routing problem (IRP) is one of the core problems that has to be solved when implementing

the emerging business practice called vendor managed inventory replenishment (VMI). VMI refers to the

situation where the replenishment of inventory at a number of locations is controlled by a central decision

maker (vendor). The central decision maker can be the supplier, and the inventory can be kept at independent

customers, or the central decision maker can be a manager responsible for inventory replenishment at a

number of warehouses or retail outlets of the same company. Often the central decision maker manages a

fleet of vehicles that make the deliveries. In this paper the central decision maker is called the supplier, and

the inventory locations are referred to as the customers.

VMI differs from conventional inventory management in the following way. In conventional inventory

management, the customers monitor their own inventory levels, and when a customer thinks that it is time

to reorder, an order for a quantity of the product is placed at the supplier. The supplier receives these orders

from the customers, prepares the product for delivery, and makes deliveries using the fleet of vehicles.

Conventional inventory management has several disadvantages. It is typical for orders not to arrive

uniformly over time. For example, one of the suppliers we worked with used to be flooded with orders on

Mondays. The conjecture was that many customers tend to check their inventory levels on Mondays, and

then place orders. The result of this nonuniform order arrival pattern is that the supplier’s resources, such as

the production and storage facilities, as well as transportation resources, cannot be utilized well over time.

For example, the supplier’s resources would be stretched to the limit on Mondays and Tuesdays, after a large

number of orders have arrived, and would be relatively idle during the rest of the week. Another related

phenomenon causes a disadvantage for the customers. Some customers place apparently urgent orders when

other customers place orders that are really urgent. Since the supplier does not know the inventory levels at

the customers, the information needed to compare the real urgency of different orders is not available. Also,

the supplier is only responsible for delivering product on order to the customer, and not for maintaining a

desirable inventory level at the customer, and hence, even if the supplier were provided with the inventory

level data, there would not be a strong incentive for the supplier to find the optimal trade-off between the

inventory needs of the different customers. Consequently really urgent orders may be delayed because of a

lack of information and incentive, and a high demand on the supplier’s resources.

In VMI, the supplier monitors the inventory at the customers. This is made possible with modern

equipment that can both measure the inventory at the customers and communicate with the supplier’s

computer. The rapidly decreasing cost of this technology has probably made a significant contribution

to the increasing popularity and success of VMI. The supplier is responsible for maintaining a desirable

inventory level at each customer, and decides which customers should be replenished at which times, and

with how much product.

To make these decisions, the supplier has the benefit of access to a lot of relevant information, such as

the current (and past) inventory levels at all the customers, the customers’ demand behavior, the customers’

locations relative to the supplier and relative to each other and the resulting transportation costs, and the

capacity and availability of vehicles and drivers for delivery.

It is thus not surprising that VMI has several advantages for the supplier over conventional inventory

2

management. First, VMI may lead to reduced production and inventory costs. By implementing VMI, the

supplier can usually obtain a more uniform utilization of resources. This reduces the amounts of resources

required and increases the productivity of the resources. It also reduces the amount of inventory the supplier

has to keep to achieve a desirable level of customer service. Second, VMI may reduce transportation costs

beyond the reduction achieved by a more uniform utilization of transportation capacity. By proactive

planning based on the additional available information instead of reactive response to customers’ orders as

they arrive, it may be possible to increase the frequency of low-cost full truckload shipments and decrease the

frequency of high-cost less-than-truckload shipments. Furthermore, it may be possible to use more efficient

routes by coordination of the replenishment at different customers close to each other. Third, VMI may

increase service levels, measured in terms of reliability of product availability, which is also an important

benefit for the customers. As discussed, under conventional inventory management the supplier does not

have the information to prioritize urgent orders from different customers. With VMI, the supplier does have

the information to determine which nonurgent deliveries can be postponed to accommodate urgent deliveries.

Similarly, the supplier does have the information to know which customers may receive smaller-than-usual

replenishments to enable larger-than-usual replenishments at other customers in dire need. Also, the supplier

has an incentive to find a good trade-off between the inventory needs of the different customers. Thus two

advantages of VMI for customers are more reliable product availability, and the fact that customers have

to devote fewer resources to monitoring their inventory levels and placing orders than under conventional

inventory management.

There are several requirements to obtain the potential benefits of VMI. Two important requirements are

(1) the availability of relevant, accurate, and timely data for the decision maker, and (2) the ability of the

central decision maker to use the increased amount of information to make good decisions. There have been

several successful, but also failed, implementations of VMI. Many of the failures are due to one or both of

the above requirements not being met.

Using the large amount of data obtained with VMI to make good decisions is a very complex task, as the

resulting decision problems turn out to be extremely hard. In this paper we study a core decision problem

that often has to be addressed when implementing VMI, namely the inventory routing problem, and we

propose methods for obtaining good decisions.

The inventory routing problem (IRP) addresses the coordination of inventory replenishment and trans-

portation. Specifically, we study the problem of determining optimal policies for the distribution of a single

product from a single supplier to multiple customers. For this purpose, the supplier controls a fleet of vehicles.

The demands at the customers are assumed to have probability distributions that are known to the supplier.

The objective is to maximize the expected discounted value, incorporating sales revenues, production costs,

transportation costs, inventory holding costs, and shortage penalties, over an infinite horizon.

Our work on this problem is motivated by our collaboration with a producer and distributor of air

products. The company operates several plants and produces a variety of products, such as liquid nitrogen

and oxygen. The company’s bulk customers have their own storage tanks at their sites, which are replenished

by tanker trucks under the company’s control. Most of the bulk customers participate in the company’s VMI

program. The inventory levels at the bulk customers are measured by remote telemetry units. Such a device

3

measures the quantity of the product in the storage tank, and is connected through a modem and the

telephone network to the company’s computer. A telemetry unit can be set to periodically measure the

inventory level and send the information to the company’s computer, and the computer can also query the

telemetry unit at any time, so that the decision maker can obtain inventory information whenever needed.

For the most part each customer and each vehicle is allocated to a specific plant, so that the overall problem

decomposes according to individual plants. Also, to improve safety and reduce contamination, each vehicle

and each storage tank at a customer is dedicated to a particular type of product. Hence the problem also

decomposes according to type of product. It seems that the most questionable assumptions are that vehicles

and drivers are available at the beginning of each day, mostly because of the unpredictability of driver

availability, and that the probability distributions of the customers’ demands are known to the supplier and

do not change over time. In practice, these probability distributions have to be estimated from data, and

the probability distributions change over time. Fortunately, in this particular case, a large amount of data is

available, and the demand characteristics of consumers do not seem to change rapidly over time. (However,

there are significant differences between demand on weekdays and weekends.)

A definition of the IRP is given in Section 1. In Section 2 research related to the IRP is reviewed.

Section 3 discusses the major computational tasks involved in solving the IRP. Section 4 presents a special

case of the IRP, namely the IRP with Direct Deliveries. In Sections 5 and 6 an approximation method for

this problem is developed. Computational results are presented in Section 7, in which the solution values of

the proposed method are compared with the optimal values for small problems, as well as with the values

of a heuristic proposed in the literature for small and medium sized problems. Further research in this area

is briefly discussed in Section 8.

1 Problem Definition

A more general description of the IRP is given in Section 1.1, after which a Markov decision process formu-

lation is given in Section 1.2.

1.1 Problem Description

A product is distributed from a supplier’s plant to N customers, using a fleet of M homogeneous vehicles,

each with known capacity CV . Each customer n has a known storage capacity Cn. The process is modeled

in discrete time t = 0, 1, . . . , and the discrete time periods are called days. Customers’ demands on different

days are independent random vectors with a joint probability distribution F that does not change with time.

The probability distribution F is known to the supplier. The supplier can measure the inventory level Xntof each customer n at any time t. The supplier makes decisions regarding which customers’ inventories to

replenish, how much to deliver at each customer, how to combine customers into vehicle routes, and which

vehicle routes to assign to each of the M vehicles. The set of feasible decisions is determined by constraints on

the travel times and work hours of vehicles and drivers, delivery time windows at the customers, the storage

capacities and current inventory levels of customers, and other constraints dictated by the application. It

may be feasible for a vehicle to perform more than one route per day. For ease of presentation we assume

4

that the duration of a vehicle route is less than the length of a day, so that all vehicles and drivers are

available at the beginning of each day, when the tasks for that day are assigned.

The cost of each decision is known to the supplier. This includes the travel costs cij on the arcs (i, j) of

the distribution network, which may also depend on the amount of product transported along the arc. The

cost of a decision may include the costs incurred at customers’ sites, for example due to product losses during

delivery. If quantity dn is delivered at customer n, the supplier earns a revenue of rn(dn). Because demand is

uncertain, there is often a positive probability that a customer runs out of stock, and thus shortages cannot

always be prevented. Shortages are discouraged with a penalty pn(sn) if the unsatisfied demand at customer

n is sn. Unsatisfied demand is treated as lost demand, and is not backlogged. If the inventory at customer n

is xn at the beginning of the day, and quantity dn is delivered at customer n, then an inventory holding cost

of hn(xn + dn) is incurred. The inventory holding cost can also be modeled as a function of some average

amount of inventory at each customer during the time period. The objective is to choose a distribution

policy that maximizes the expected discounted value (revenues minus costs) over an infinite time horizon.

1.2 Problem Formulation

We formulate the IRP as a discrete time Markov decision process with the following components:

1. The state x is the current inventory at each customer. Thus the state space X is [0, C1] × [0, C2] ×· · · × [0, CN ]. Let Xnt ∈ [0, Cn] denote the inventory level at customer n at time t. Let Xt =

(X1t, . . . , XNt) ∈ X denote the state at time t.

2. The action space A(x) for each state x is the set of all decisions that satisfy the work load constraints,

such that the vehicles’ capacities are not exceeded, and the customers’ storage capacities are not

exceeded after deliveries. Let At ∈ A(Xt) denote the decision chosen at time t. For any decision

a and arc (i, j), let kij(a) denote the number of times that arc (i, j) is traversed by a vehicle while

executing decision a. Also, for any customer n, let dn(a) denote the quantity of product that is delivered

to customer n while executing decision a. The constraint that customers’ storage capacities not be

exceeded after deliveries can be expressed as Xnt + dn(At) ≤ Cn for all n and t, if it is assumed that

no product is used between the time that the inventory level Xnt is measured and the time that the

delivery of dn(At) takes place. If product is used during this time period, it may be possible to deliver

more. The exact way in which the constraint is applied does not affect the rest of the development.

We applied the constraint as stated above.

3. Let Unt denote the demand of customer n at time t. Then the amount of product used by customer

n at time t is given by min{Xnt + dn(At), Unt}. Thus the shortage at customer n at time t is

given by Snt = max{Unt − (Xnt + dn(At)), 0}, and the next inventory level at customer n at time

t + 1 is given by Xn,t+1 = max{Xnt + dn(At) − Unt, 0}. The known joint probability distribution

F of customer demands gives a known Markov transition function Q, according to which transitions

occur. For any state x ∈ X , any decision a ∈ A(x), and any (measurable) subset B ⊆ X , let

U(x, a,B) ≡ {U ∈ N+ : max{xn + dn(a)− Un, 0} ∈ B}

. Then Q[B | x, a] ≡ F [U(x, a,B)]. In other

5

words, for any state x ∈ X , and any decision a ∈ A(x),

P [Xt+1 ∈ B | Xt = x,At = a] = Q[B | x, a] ≡ F [U(x, a,B)]

For discrete demand distributions, let fn(un) denote the probability that the demand of customer n is

un.

4. Let g(x, a) denote the expected single stage net reward if the process is in state x at time t, and decision

a ∈ A(x) is implemented. Then, in terms of the notation introduced above,

g(x, a) ≡∑n

rn(dn(a)) −∑(i,j)

cijkij(a)−∑n

hn(xn + dn(a))

−∑n

EFn

[pn

(max{Un − (xn + dn(a)), 0})]

where EFn denotes expected value with respect to the marginal probability distribution Fn of Un.

5. The objective is to maximize the expected total discounted value over an infinite horizon. Let α ∈ [0, 1)

denote the discount factor. Let V ∗(x) denote the optimal expected value given that the initial state is

x, i.e.,

V ∗(x) ≡ sup{At}∞

t=0

E

[ ∞∑t=0

αtg (Xt, At)

∣∣∣∣∣X0 = x

](1)

The decisions At are restricted such that At ∈ A(Xt) for each t, and At has to depend only on the

history (X0, A0, X1, . . . , Xt) of the process up to time t, i.e., when the decision maker chooses an action

at time t, the decision maker does not know what is going to happen in the future.

A stationary deterministic policy π prescribes a decision a ∈ A(x) based on the information contained in

the current state x of the process only. For any stationary deterministic policy π, and any state x ∈ X , the

expected value V π(x) is given by

V π(x) ≡ Eπ

[ ∞∑t=0

αtg (Xt, π(Xt))

∣∣∣∣∣X0 = x

]

= g(x, π(x)) + α

∫XV π(y) Q[dy | x, π(x)]

From the results in Bertsekas and Shreve (1978) it follows that under conditions that are not very restrictive

(e.g., g bounded and α < 1), to determine the optimal expected value in (1), it is sufficient to restrict

attention to the class Π of stationary deterministic policies. It follows that for any state x ∈ X ,

V ∗(x) = supπ∈Π

V π(x)

= supa∈A(x)

{g(x, a) + α

∫XV ∗(y) Q[dy | x, a]

}(2)

6

A policy π∗ is called optimal if V π∗

= V ∗.

2 Review of Related Research

The long-term dynamic and stochastic control problem presented above is extremely difficult to solve. As

a result, all of the proposed approaches found in the literature have simplified the problem in one way or

another. Table 1 is an attempt to categorize the variants of the inventory routing problem that have been

studied by different researchers and the contributions that they have made. A survey of some of this work can

be found in Federgruen and Simchi-Levi (1995). Thomas and Griffin (1996) review related work addressing

the coordination of various operations in the supply chain, such as production, inventory, and distribution.

The column headings in the table represent some key problem characteristics, which we briefly describe

here. Customer demands, which in most applications are not known to the decision maker before the usage

takes place (or, in conventional inventory management, before the orders are received), have been modeled

as being either deterministic or stochastic. Fleet size, i.e, the number of available vehicles, which is limited

in practice, is sometimes assumed to be unlimited to facilitate the analysis of a proposed policy. Another

key issue is the length of the planning horizon. In applications, the objective is to maximize profit over

a long period of time, and some researchers explicitly model this objective. Other researchers consider a

short horizon problem where they do not take into account what happens after the short horizon over which

they optimize the objective. Some researchers develop a reduced horizon approach in which a short horizon

problem is formulated where the costs are heuristically modified to capture what happens after the short

horizon. Another issue is the number of customers visited on a vehicle trip. In many situations vehicles can

visit multiple customers on a single route. Several researchers have also studied variants in which a single

customer is visited on each route, which is called the direct delivery case. Finally, a distinguishing feature of

research contributions is whether policies or solution methods are presented that specify when to deliver to

each customer, how much to deliver to each customer, and how to deliver to customers, or whether bounds

on the profits (or costs) are presented.

3 Solving the Markov Decision Process

To determine the optimal value function V ∗, and an optimal policy π∗, if such a policy exists, the optimality

equation (2) has to be solved. This requires the following major computational tasks to be performed.

1. Estimation of the optimal value function V ∗. Because V ∗ appears in the left hand side and right hand

side of (2), most algorithms for computing V ∗ involves the computation of successive approximations

to V ∗(x) for every x ∈ X . Clearly, this is practical only if the number of states is small. For the IRP

as formulated in Section 1.2, X may be uncountable. One can discretize X by discretizing the demand

distributions. Conditions under which the solutions obtained with the discretization of X converge to

the solution of (2) have been studied by Bertsekas (1975), Chow and Tsitsiklis (1991), and Kushner

and Dupuis (1992). Even if the demand distributions are discretized, the number of states grows

exponentially in the number of customers. For example, if Z denotes the number of inventory levels

7

Table 1: Characteristics of inventory routing problems considered by various researchers.

Reference Demands Vehicles Horizon Delivery Contribution

Bell et al. (1983) Deterministic Limited Long Multiple PolicyFedergruen and Zipkin (1984) Stochastic Limited Short Multiple PolicyGolden, Assad and Dahl (1984) Stochastic Limited Short Multiple PolicyBlumenfeld et al. (1985, 1991) Deterministic Unlimited Long Direct PolicyBurns et al. (1985) Deterministic Unlimited Long Direct, Multiple PolicyDror, Ball and Golden (1985) Deterministic Limited Short Multiple PolicyDror and Ball (1987) Stochastic Limited Reduced Multiple PolicyCohen and Lee (1988) Stochastic Unlimited Long Direct PolicyBenjamin (1989) Deterministic Unlimited Long Direct PolicyChien, Balakrishnan and Wong (1989) Deterministic Limited Reduced Multiple PolicyAnily and Federgruen (1990) Deterministic Unlimited Long Multiple Bound, PolicyGallego and Simchi-Levi (1990) Deterministic Unlimited Long Direct BoundTrudeau and Dror (1992) Stochastic Limited Reduced Multiple PolicyAnily and Federgruen (1993) Stochastic Unlimited Long Multiple Bound, PolicyChien (1993) Stochastic Unlimited Long Direct PolicyMinkoff (1993) Stochastic Unlimited Long Multiple PolicyPyke and Cohen (1993, 1993) Stochastic Unlimited Long Direct PolicyChandra and Fisher (1994) Deterministic Unlimited Long Multiple PolicyBassok and Ernst (1995) Stochastic Unlimited Short Multiple PolicyDror and Trudeau (1996) Stochastic Unlimited Long Direct PolicyBard et al. (1997) Stochastic Limited Reduced Multiple PolicyBarnes-Schuster and Bassok (1997) Stochastic Unlimited Long Direct Bound, PolicyJaillet et al. (1997) Stochastic Limited Reduced Multiple PolicyCampbell et al. (1998) Deterministic Limited Short Multiple PolicyChan, Federgruen and Simchi-Levi (1998) Deterministic Unlimited Long Multiple Bound, PolicyChristiansen and Nygreen (1998a, 1998b,1999) Deterministic Limited Long Multiple PolicyBerman and Larson (1999) Stochastic Unlimited Short Multiple PolicyFumero and Vercellis (1999) Deterministic Limited Long Multiple PolicyReiman, Rubio and Wein (1999) Stochastic Limited Long Direct, Multiple PolicyCetinkaya and Lee (2000) Stochastic Unlimited Long Multiple PolicyKleywegt, Nori and Savelsbergh (2000) Stochastic Limited Long Direct, Multiple Policy

8

at each customer, then the number of states |X | = ZN . Thus, even with discrete inventory levels, the

state space X is far too large to compute V ∗(x) for every x ∈ X in reasonable time if there are more

than about four customers.

2. Estimation of the expected value (integral) in (2). For many applications, this is a high dimensional

integral, which requires a lot of computational effort to compute accurately. In the case of the IRP,

the number of dimensions is equal to the number of customers, which can be as much as several

hundred. Conventional numerical integration methods are not practical for the computation of such

high dimensional integrals.

3. The maximization problem on the right hand side of (2) has to be solved to determine an optimal

decision for each state. This maximization problem may be easy or hard, depending on the application.

In the case of the IRP, the optimization problem on the right hand side of (2) is very hard, because

the vehicle routing problem, which is NP-hard, is a special case.

There are several conventional algorithms for solving Markov decision processes; see for example Bert-

sekas (1995) and Puterman (1994). These algorithms are practical only if the computational tasks discussed

above are easy to perform. As mentioned, these requirements are not satisfied by practical inventory routing

problems, as the state space X is usually extremely large, the expected value is hard to compute, and the

optimization problem on the right hand side of (2) is hard to solve.

Our approach is to develop efficient dynamic programming based approximation methods to perform these

computations. The first motivation for using approximation methods is the computational complexity of the

IRP outlined above. A motivation for using specifically dynamic programming based approximation methods

is as follows. Suppose V ∗ is approximated by V such that∥∥∥V ∗ − V

∥∥∥∞≤ ε, that is,

∣∣∣V ∗(x) − V (x)∣∣∣ ≤ ε for

all x ∈ X . Choose policy π ∈ Π such that

g(x, π(x)) + α

∫XV (y) Q[dy | x, π(x)] ≥ sup

a∈A(x)

{g(x, a) + α

∫XV (y) Q[dy | x, a]

}− δ

for all x ∈ X , that is, the objective value of decision π(x) is within δ of the optimal objective value using

approximating function V on the right hand side of the optimality equation (2). Then

V π(x) ≥ V ∗(x)− 2αε + δ

1− α

for all x ∈ X , that is, the value function V π of policy π is close to the optimal value function V ∗.

The application of our proposed method to the IRP with Direct Deliveries is discussed in the next section.

4 The IRP with Direct Deliveries

In the remainder of the paper we consider the special case of the IRP in which only one customer is visited

on each vehicle route. This special case of the IRP is called the IRP with Direct Deliveries (IRPDD). The

reasons why the IRPDD is of interest are discussed next.

9

If the storage capacities and demands of the customers are sufficiently large relative to the vehicle capacity,

and the inventory holding cost is low relative to the transportation cost, then it is often optimal to deliver

full vehicle loads or nearly full vehicle loads to customers. Gallego and Simchi-Levi (1990) analyzed a

single-depot/multi-customer distribution system with constant (deterministic) demand rates, in which no

shortages or backlogs were allowed. Customer storage capacities were not constrained. Transportation cost

proportional to the total distance traveled, a linear inventory holding cost, and ordering costs were taken

into account. They assumed availability of an unlimited number of vehicles with limited capacity. They

studied conditions under which direct delivery is an efficient policy. A lower bound on the long-run average

cost over all policies was derived, by adding a lower bound on the average inventory holding and ordering

costs, using a traditional economic order quantity model, and a lower bound on the long-run transportation

costs, obtained from the model of Haimovich and Rinnooy Kan (1985). An upper bound was derived on

the average cost of a particular direct delivery policy as a function of the economic order quantities (EOQ)

of the customers. It was concluded that the effectiveness (the ratio of the infimum of long-run average cost

over all policies to the long-run average cost of the direct delivery policy) is large (e.g., at least 94%) when

the EOQ of all customers is large relative to the vehicle capacity (e.g., at least 71%).

Barnes-Schuster and Bassok (1997) studied a single-depot/multi-customer distribution system with ran-

dom demands over an infinite horizon. Customer storage capacities were constrained. Linear inventory

holding costs and transportation costs between the depot and the retailers were incorporated. The fleet

size was assumed to be unlimited, but vehicle capacities were limited. The objective was to study the cost

effectiveness of using a particular direct delivery policy. The policy delivers as many full truck loads at a

customer as the remaining capacity at the customer can accommodate. A lower bound was obtained on

the expected long-run average cost per period as a sum of the expected inventory holding cost, using an

infinite horizon newsvendor problem, and the expected transportation cost, extending the bound developed

by Haimovich and Rinnooy Kan (1985) for one retailer and a single period. The policy of direct delivery

with full truck loads was simulated and compared with the lower bound. The results indicate that the policy

performs well in situations in which truck sizes are close to the means of the customer demand distributions.

The formulation of the IRPDD is the same as the formulation of the IRP in Section 1.2 except for the

following.

1. The action space A(x) for each state x is the set of all decisions consisting of routes that visit only one

customer on a route, and that satisfy the work load, time window, and capacity constraints as before.

Each decision a consists of individual customer itineraries an, n = 1, . . . , N . Itinerary an denotes the

number of visits to customer n by each vehicle and the amount of product delivered at customer n by

each vehicle. Let tn denote the amount of time required per vehicle route from the supplier to customer

n and back.

2. The transportation costs can now be associated with the individual customers, instead of with the

routes on the network. For example, if cn denotes the transportation cost for traveling from the

supplier to customer n and back, and vn(an) denotes the number of times that customer n is visited

10

by a vehicle while executing itinerary an, then

g(x, a) ≡N∑n=1

{rn(dn(an))− cnvn(an)− hn(xn + dn(an))− EFn

[pn

(max{Un − (xn + dn(an)), 0})]}

(3)

Although the hard routing and delivery quantity decisions of the IRP become much easier if only one

customer is visited on each vehicle route, the IRPDD is still a hard problem to solve if there are more than

about four customers and a limited number of vehicles, due to the number of states growing exponentially in

the number of customers. To illustrate the effect of this rapid growth, a number of instances of the IRPDD

were solved to optimality using the modified policy iteration algorithm. All instances had Cn = 10 for all

customers n, fn(u) = 1/10 for all customers n and u = 1, . . . , 10, CV = 5, and α = 0.98. Table 2 shows the

rapid growth in computation times on a 166MHz Pentium PC as the number of customers increases.

Table 2: Computation time to find the optimal solution for some instances of the IRPDD.Instance

Customers Vehicles Time (s)2 1 33 2 9004 3 86400

Because direct deliveries are important in practice, as well as to study approximation methods for the

first two computational tasks discussed in Section 3 without being hampered by hard routing problems, we

investigated the IRPDD first before moving on to the more general IRP.

5 Approximating the Value Function

5.1 A Decomposition Approximation

The first major task is the construction of an approximation V to the optimal value function V ∗. Our

approximation is based on a decomposition of the IRPDD into individual customer subproblems, motivated

as follows. From (3) it follows that

g(x, a) =N∑n=1

gn(xn, an)

where

gn(xn, an) ≡ rn(dn(an))− cnvn(an)− hn(xn + dn(an))− EFn

[pn

(max{Un − (xn + dn(an)), 0})] (4)

The only consideration that prevents the exact decomposition of the IRPDD into individual customer sub-

problems, is the limited number of vehicles that have to be assigned to customers each time period. The

11

challenge is to incorporate this dependence between customers in a computationally tractible way.

Consider any policy π ∈ Π. In general, the chosen decision under policy π depends on the state x, and

thus the inventory levels at all the customers. Let πn(x) denote the itinerary associated with customer n

chosen under policy π when the state is x. Assume that the demand distribution and thus the state space Xare discrete. Let νπ(x) denote the stationary probability of state x under policy π, assuming the existence

of unique stationary probabilities under policy π. Then, given the current inventory level xn and delivery

quantity en at customer n, the probability qn(yn,mn|xn, en) that under policy π, at the beginning of the

next day the inventory level at customer n is yn, and customer n is visited mn times by a vehicle, is given

by

qn(yn,mn|xn, en) =

∑{s∈X : sn=xn,dn(πn(s))=en} ν

π(s)∑

{z∈X : zn=yn,vn(πn(z))=mn}Q[z | s, π(s)]∑{s∈X : sn=xn,dn(πn(s))=en} νπ(s)

(5)

if the denominator is positive, and qn(yn,mn|xn, en) = 0 if the denominator is 0. The choice of policy π and

the estimation of qn(yn,mn|xn, en) are discussed later. With these probabilities qn(yn,mn|xn, en) we define

the following MDP for each customer n.

1. State (xn,mn) denotes that the inventory level at customer n is xn and customer n can be visited up

to mn times by a vehicle. Let (Xnt,Mnt) denote the state at time t, and let Xn denote the state space

of the MDP associated with customer n.

2. The set An(xn,mn) of admissible actions an when the state is (xn,mn), is the dispatching of up to mnvehicle trips to customer n, and the delivery of amounts of product constrained by the vehicle capacity

CV and the customer storage capacity Cn. Let Ant denote the decision at time t.

3. The transition probabilities are as follows.

P [(Xn,t+1,Mn,t+1) = (yn, kn) | (Xnt,Mnt) = (xn,mn), Ant = an] = qn(yn, kn|xn, dn(an))

4. The expected net reward per stage, given state (xn,mn) and action an, is gn(xn, an), as in (4).

5. The objective is to maximize the expected total discounted value over an infinite horizon. Let

V ∗n (xn,mn) denote the optimal expected value given that the initial state is (xn,mn), i.e.,

V ∗n (xn,mn) ≡ sup

{Ant}∞t=0

E

[ ∞∑t=0

αtgn (Xnt, Ant)

∣∣∣∣∣ (Xn0,Mn0) = (xn,mn)

]

The actions Ant are again constrained to be feasible and nonanticipatory.

The optimal values V ∗n (xn,mn) of the individual customer MDPs are easily computed, because the state

spaces of the individual customer MDPs are much smaller than the state space of the IRPDD.

The next issue to be addressed is, given a state x = (x1, . . . , xN ) ∈ X of the IRPDD, how to combine the

optimal values V ∗n (xn,mn) of the individual customer MDPs to find a good approximation V (x) to V ∗(x).

To do that, appropriate values of mn has to be chosen for each n, that is, the fleet capacity has to be assigned

12

to the individual customers. The approximate value V (x) is calculated by assigning the available work time

of the M vehicles to the N customers to maximize the total value given by the resulting individual customer

MDPs. That is, the approximate value V (x) is given by the optimal value of the following nonlinear knapsack

problem.

V (x) ≡ maxw=(w1,... ,wN )∈ZZN

+

N∑n=1

V ∗n (xn, wn)

s.t.N∑n=1

tnwn ≤ MT (6)

Recall that tn denotes the amount of time required per vehicle route from the supplier to customer n and

back, M denotes the number of vehicles in the fleet, and T denotes the maximum amount of work time per

vehicle per time period. The nonlinear knapsack problem is easily solved using dynamic programming.

Although the resulting vehicle assignment may constitute a good decision, the knapsack problem (6) is

primarily solved to obtain the approximate values V (y), and the decision π(x) is given by a maximizer in

the optimality equation, using V to approximate the values of future states, as follows.

π(x) ∈ arg maxa∈A(x)

g(x, a) + α

∑y∈X

Q[y | x, a] V (y)

(7)

This method can also be interpreted as a multistage lookahead method, whereby the knapsack problem

is solved to determine the tentative decision at the second stage, and the optimal value functions of the

individual customer MDPs give the objective function for the knapsack problem to take into account the

expected net reward from the second stage onwards.

5.2 An Algorithm

The development in Section 5.1 assumed that the conditional probabilities qn(yn,mn|xn, en) are known.

Computing the probabilities qn(yn,mn|xn, en) exactly using (5) is almost as hard as solving the IRPDD,

because the stationary probabilities νπ(x) have to be computed for all x ∈ X . Since qn(yn,mn|xn, en) is a

five dimensional parameter (with dimensions corresponding to n, yn, mn, xn, and en), if there are more than

about five customers, then the number of probabilities qn(yn,mn|xn, en) is usually less than the number

|X | of states. Thus one may attempt to estimate the probabilities qn(yn,mn|xn, en) without computing the

stationary probabilities νπ(x). One straightforward method to do this is to simulate the IRPDD process

under policy π. Let qnt(yn,mn|xn, en) denote the estimate of qn(yn,mn|xn, en) after t transitions of the

simulation. One method for updating the estimates qnt(yn,mn|xn, en) is as follows. Let Nnt(xn, en) denote

the number of times that customer n has been in state xn and quantity en has been delivered at customer

n by transition t of the simulation. Then

qn,t+1(yn,mn|xn, en) =

13

(Nn0(yn,mn|xn, en) + Nnt(xn, en))qnt(yn,mn|xn, en) + 1Nn0(yn,mn|xn, en) + Nnt(xn, en) + 1

if Xnt = xn and dn(πn(Xt)) = en and Xn,t+1 = yn and vn(πn(Xt+1)) = mn(Nn0(yn,mn|xn, en) + Nnt(xn, en))qnt(yn,mn|xn, en)

Nn0(yn,mn|xn, en) + Nnt(xn, en) + 1

if Xnt = xn and dn(πn(Xt)) = en and (Xn,t+1 �= yn or vn(πn(Xt+1)) �= mn)

qnt(yn,mn|xn, en)if Xnt �= xn or dn(πn(Xt)) �= en

(8)

where Nn0(yn,mn|xn, en) represents a weight, equivalent to Nn0(yn,mn|xn, en) observations, assigned to

the initial estimate qn0(yn,mn|xn, en). It follows from results for Markov chains (Meyn and Tweedie

1993) that if the Markov chain under policy π has a unique stationary probability distribution νπ , then

qnt(yn,mn|xn, en)→ qn(yn,mn|xn, en) as t → ∞ with probability 1 for all inventory levels xn and delivery

quantities en that occur infinitely often, i.e., for all inventory levels xn and delivery quantities en such that

Nnt(xn, en) → ∞ as t → ∞. Convergence with probability 1 can also be established for other update

methods.

However, for most applications the number of probabilities qn(yn,mn|xn, en) is far too large to estimate

accurately in reasonable time using simulation. To resolve this dilemma, we use the following approach.

The conditional probability pn(mn|yn) that customer n is visited by mn vehicles under policy π, given

that the inventory level at customer n is yn, is given by

pn(mn|yn) =

∑{x∈X : xn=yn,vn(πn(x))=mn} ν

π(x)∑{x∈X : xn=yn} νπ(x)

if the denominator is positive, and pn(mn|yn) = 0 if the denominator is 0. The number of probabilities

pn(mn|yn) is much fewer than the number of probabilities qn(yn,mn|xn, en). The probabilities pn(mn|yn)

can be estimated by simulating the IRPDD process under policy π. Let pnt(mn|yn) denote the estimate of

pn(mn|yn) after t transitions of the simulation. The estimates pnt(mn|yn) can be updated similarly to the

estimates qnt(yn,mn|xn, en) in (8), as follows. Let Nnt(yn) denote the number of times that customer n has

been in state yn by transition t of the simulation. Then

pn,t+1(mn|yn) =

(Nn0(mn|yn) + Nnt(yn))pnt(mn|yn) + 1Nn0(mn|yn) + Nnt(yn) + 1

if Xnt = yn and vn(πn(Xt)) = mn

(Nn0(mn|yn) + Nnt(yn))pnt(mn|yn)Nn0(mn|yn) + Nnt(yn) + 1

if Xnt = yn and vn(πn(Xt)) �= mn

pnt(mn|yn) if Xnt �= yn

(9)

Similar convergence results as for qnt(yn,mn|xn, en) in (8) hold for pnt(mn|yn) in (9).

Then an estimate qnt(yn,mn|xn, en) for qn(yn,mn|xn, en) is obtained as follows.

qnt(yn,mn|xn, en) =

{fn(xn + en − yn)pnt(mn|yn) if yn > 0∑∞un=xn+en

fn(un)pnt(mn|yn) if yn = 0(10)

14

In general, the estimates in (10) are not the same as those given in (8). However, the estimates in (10) are

much easier to compute than those in (8), and in numerical tests the estimates were very close to each other.

Now the building blocks are in place to state the first approximation procedure for the IRPDD, given in

Algorithm 1.

Algorithm 1 Approximation Algorithm for IRPDD.1. Start with an initial policy π0. Set i← 0.

2. Repeat steps 3 through 6 for a chosen number of iterations, or until a convergence test is satisfied.

3. Simulate the IRPDD under policy πi to estimate the probabilities pn(mn|yn).4. With the updated estimates of the probabilities pn(mn|yn), formulate and solve the updated individual

customer MDPs.

5. Policy πi+1 is defined by (7), where V is given by (6) with the updated individual customer valuesV ∗n (xn,mn).

6. Increment i← i + 1.

5.3 Parametric Value Function Approximations

One may attempt to improve the approximation described in Section 5.1 by introducing parameters β

into the value function approximation V (x, β). One type of parametric value function approximation with

computational advantages is a function

V (x, β) = β1φ1(x) + · · ·+ βKφK(x) (11)

that is linear in the parameters β, where the φks are chosen basis functions. Van Roy et al. (1997) used

a similar approach to develop an approximation method for a retailer inventory management problem that

was introduced by Nahmias and Smith (1994). Parametric value function approximations are discussed in

detail in Bertsekas and Tsitsiklis (1996).

When using this approach, the parameters β have to be chosen as well. We discuss two approaches for

obtaining parameters β. The first approach is as follows. Consider any policy π ∈ Π with unique stationary

probabilities νπ(x). An appealing idea is to choose the parameters β in such a way that V approximates V π

“as well as possible”. One way to do this is to choose β to solve the following optimization problem.

minβ

∑x∈X

νπ(x)[V π(x)− V (x, β)

]2

(12)

This problem looks like a weighted least squares regression problem, except that νπ and V π are unknown.

Tsitsiklis and Van Roy (1997) showed that if V (x, β) is linear in the parameters β, and other conditions

(given later) hold, then the following stochastic approximation method can be used to compute the optimal

solution β∗ of (12). Suppose the IRPDD process under policy π is simulated. Let βt denote the estimate of

15

the parameters after transition t of the simulation. Then the parameter estimates βt are updated as follows.

βt+1 = βt + γtdtzt

where γt is the step size at iteration t,

dt = g(Xt, π(Xt)) + αV (Xt+1, βt)− V (Xt, βt)

is the so-called temporal difference, or

dt = g(Xt, π(Xt)) + α∑y∈X

V (y, βt) Q[y | Xt, π(Xt)]− V (Xt, βt)

is the expected temporal difference,

zt = αλzt−1 +∇β V (Xt, βt)

is the so-called eligibility vector, λ ∈ [0, 1] is a memory parameter, and ∇β V (Xt, βt) is the gradient of V

with respect to β evaluated at (Xt, βt). If V is linear in β, as in (11), then ∇β V (Xt, βt) has components

∂V (Xt, βt)/∂βk = φk(Xt) for k = 1, . . . ,K. If (1) |X | < ∞, (2) the Markov chain under policy π is

aperiodic with one recurrent class, (3) V (x, β) is linear in the parameters β, (4) the basis functions φk

restricted to the set of recurrent states are linearly independent, (5) the step sizes γt satisfy∑∞t=0 γt = ∞

and∑∞t=0 γ

2t < ∞, and (6) λ = 1, then the parameters βt converge to the optimal solution β∗ of (12) as

t → ∞ with probability 1. A disadvantage of stochastic approximation methods is that the convergence of

the parameters βt is notoriously slow.

Another approach for obtaining parameters β is as follows. The value function V π of a policy π ∈ Π

satisfies

V π(x) = g(x, π(x)) + α∑y∈X

V π(y) Q[y | x, π(x)] (13)

Again assume that π has unique stationary probabilities νπ(x). Then it seems appealing to choose the

parameters β to minimize the weighted discrepancy between the left hand side and right hand side of (13).

Thus, the parameters are chosen to be an optimal solution β∗ of

minβ

∑x∈X

νπ(x)

V (x, β) −

g(x, π(x)) + α

∑y∈X

V (y, β) Q[y | x, π(x)]

2

(14)

This approach is called the Bellman error method.

If V (x, β) is linear in the parameters β, then the corresponding parameter estimates βt can be computed

as follows. Let φ(x) ≡ (φ1(x), . . . , φK(x))T , and let ψ(x) ≡ φ(x) − α∑y∈X φ(y) Q[y | x, π(x)]. Then the

16

optimization problem (14) can be written

minβ

∑x∈X

νπ(x)[ψ(x)T β − g(x, π(x))

]2(15)

This problem also looks like a weighted least squares regression problem, except that νπ is unknown. Let ψ

denote the |X | ×K matrix with rows given by ψ(x)T , let Y denote the |X | × 1 matrix with elements given

by g(x, π(x)), and let ∆π denote the |X |× |X | diagonal matrix with diagonal elements given by νπ(x). Then

any solution β of ψT∆πψβ = ψT∆πY is an optimal solution of (15). If the columns of ∆πψ are linearly

independent (which should be the case if the basis functions φk are well chosen), then ψT∆πψ is positive

definite, and the optimal solution β∗ of (15) is unique.

To overcome the obstacle that νπ, and thus also ∆π, are unknown, one can simulate the IRPDD process

under policy π, and use the following result for Markov chains (Meyn and Tweedie 1993). If the Markov

chain has a single positive recurrent class with stationary probability distribution ν, then for any function

f : X �→ IR such that∫X |f(x)|dν(x) <∞, it holds that, with probability 1,

∑tτ=1 f(Xτ )/t→

∫X f(x)dν(x)

as t → ∞. To apply this result to (15), define K(K + 1)/2 functions fij(x) ≡ φi(x)φj(x) and K functions

gi(x) ≡ φi(x)g(x, π(x)). Then ψT∆πψ is the matrix with element (i, j) equal to∫X fij(x)dνπ(x), and

ψT∆πY is the vector with element i equal to∫X gi(x)dνπ(x). The sample averages

∑tτ=1 fij(Xτ )/t and∑t

τ=1 gi(Xτ )/t are easily computed, as follows. Let F0 ≡ 0 be a K × K matrix, and let Ft+1 ≡ Ft +

ψ(Xt)ψ(Xt)T . Then element (i, j) of Ft is equal to∑tτ=1 fij(Xτ ). Also, let G0 ≡ 0 be a K × 1 matrix,

and let Gt+1 ≡ Gt + ψ(Xt)g(Xt, π(Xt)). Then element i of Gt is equal to∑tτ=1 gi(Xτ ). It follows from the

above result for Markov chains that Ft/t → ψT∆πψ and that Gt/t → ψT∆πY as t → ∞. Let βt be any

solution of the system of linear equations Ftβt = Gt. Then the distance between βt and the set of optimal

solutions of (14) converges to zero as t→∞. Furthermore, if the columns of ∆πψ are linearly independent,

then for sufficiently large t, Ft is positive definite, and the unique solution βt of Ftβt = Gt converges to the

unique optimal solution β∗ of (14).

The optimal solutions of (12) and (14) are not the same in general. One can also formulate multistage

Bellman error objective functions, the optimal solutions of which can be shown to be close to the optimal

solutions of (12). However, from our computational experience for the IRPDD, the optimal solutions of (14)

are close to the optimal solutions of (12), and the optimal solutions of (14) combined with the proposed

basis functions provide good policies, as illustrated in Section 7.2. The Bellman error method has the

advantage that the parameter estimates βt converge much faster to β∗ than with stochastic approximation.

The objective of (14) may not seem quite as appealing as the objective of (12). Rewriting the objective

function of (14) as∑x∈X νπ(x)

[(V (x, β) − α

∑y∈X V (y, β) Q[y | x, π(x)]

)− g(x, π(x))

]2

, it follows that

this objective chooses β in such a way that V (x, β) − α∑y∈X V (y, β) Q[y | x, π(x)] is close to the expected

single stage net reward g(x, π(x)). In contrast, (12) chooses β such that V (x, β) is close to V π(x), which

seems more appealing, especially in the light of the approximation results in Section 3.

Van Roy et al. (1997) proposed an approximation with basis functions φk chosen as first and second degree

polynomials of “features” of x, for their inventory management problem. The resulting policies performed

better than an order-up-to heuristic. We tested such an approximation for the IRPDD with the φks chosen

17

as first and second degree polynomials of x. The performance of the resulting policies was quite poor.

One can combine the decomposition approximation and the parametric approximation to obtain an

approximate value V (x, β) for any given state x ∈ X , where

V (x, β) ≡ β0 +N∑n=1

βnV∗n (xn, w∗

n(x)) (16)

where w∗(x) = (w∗1(x), . . . , w∗

N (x)) is an optimal solution of the nonlinear knapsack problem (6). It is shown

in Section 7 that the policies π based on using the approximation V (x, β) in (16) in the right hand side of (7),

gave excellent numerical results. A procedure that can be used to compute V (x, β) and the resulting policies

π is given in Algorithm 2.

Algorithm 2 Procedure for computing V (x, β) and π.1. Start with an initial policy π0. Set i← 0.

2. Simulate the IRPDD under policy π0 to estimate the probabilities pn(mn|yn).

3. Formulate and solve the individual customer MDPs.

4. Policy π1 is defined by (7), where V is given by (6).

5. Repeat steps 6 through 9 for a chosen number of iterations, or until a convergence test is satisfied.

6. Increment i← i + 1.

7. Simulate the IRPDD under policy πi to update the estimates of the probabilities pn(mn|yn) and theparameters β.

8. With the updated estimates of the probabilities pn(mn|yn), formulate and solve the updated individualcustomer MDPs.

9. Policy πi+1 is given by (7), where V is given by (16) with the updated parameters β and individualcustomer values V ∗

n (xn,mn).

6 Estimation of the Expected Value and Optimal Action

The second major computational task discussed in Section 3 is the estimation of the expected value on

the right hand side of (2) or (7). In the case of the IRPDD, the expected value is a multidimensional

integral with the number of dimensions equal to the number of customers. Conventional deterministic

numerical integration methods can be used to estimate the expected value. A popular approach is to use the

Newton-Cotes formulas; specific examples of these include Euler’s rule, the trapezoid rule, and Simpson’s

rules. Computing the expected value of a multidimensional discrete distribution corresponds to Euler’s rule.

Randomized (Monte Carlo) methods can also be used to estimate the expected value.

The computational efficiency of these methods is a relevant issue. Many deterministic numerical inte-

gration methods construct a grid on the space to be integrated over, and compute the integrand values at

the grid points. Let Z denote the number of grid points per dimension, and let d (= N) denote the number

18

of dimensions. Then the total number of integrand values computed is given by n = Zd. The error of

many of these methods is O(Z−c) = O(n−c/d), where c is a constant that depends on the specific method

(Stroud 1971). For example, for the trapezoid rule c = 2, and for Simpson’s 1/3 rule c = 4 (Mustard,

Lyness and Blatt 1963). One measure for the comparison of the accuracy of different methods is the mean

square error (MSE) as a function of the number of integrand evaluations n. For the deterministic methods

discussed above, MSE = error2 = O(n−2c/d). For randomized methods using simple random sampling, MSE

= Variance = O(n−1). It follows that simple random sampling tends to give better performance than the

deterministic methods (at least for large values of n), if d > 2c. For example, simple random sampling is

more efficient than the trapezoid rule if d > 4, and simple random sampling is more efficient than Simpson’s

1/3 rule if d > 8. Also, for large values of d (d > 20, say), the number of grid points n = Zd becomes too

large to evaluate the integrand at, even with Z = 2, so that conventional deterministic methods are not

practical at all. Thus randomized methods are preferred for estimating the expected value on the right hand

side of (7) for instances of the IRP with a large number of customers.

The use of randomized methods raises a number of related questions.

1. What sample size n should be used?

2. Since the objective function on the right hand side of (7) is estimated with a random estimator with

error, how should the action be chosen?

3. What performance guarantees can be given for the action chosen after the objective value has been

randomly estimated?

These questions have been widely studied in the statistics and stochastic optimization areas.

For the IRPDD, we followed the approach proposed by Nelson and Matejcik (1995). Suppose the current

state is x. Let the actions in A(x) be numbered a = 1, . . . , k, where k = |A(x)|. Let Yaj denote random

observation j of the right hand side of (7) under action a ∈ A(x). That is,

Yaj = g(x, a) + αV (Xj) (17)

where state Xj is randomly generated from distribution Q[ · | x, a]. Let Yj ≡ (Y1j , . . . , Ykj). It is assumed

that Y1, Y2, . . . are i.i.d. normally distributed with unknown mean µ and unknown covariance matrix Σ. Thus,

given the current state x, µa is the value of action a on the right hand side of (7), µa = g(x, a)+α∑y∈X Q[y |

x, a] V (y). To get observations Yj that are approximately normally distributed, we used batch means as

observations and relied on the central limit theorem. It is also assumed that Σ has the sphericity property.

Sphericity implies that Var[Yaj−Ybj ] is the same for all actions a, b ∈ A(x), a �= b. Nelson and Matejcik (1995)

presented evidence that their method (given in Algorithm 3) is robust with respect to deviations from

sphericity as long as the covariances σab between Yaj and Ybj is nonnegative, which one would expect to hold

when using common random numbers for computing Y1j , . . . , Ykj .

If the assumptions stated above are satisfied, then whenever µb ≥ µa + δ for all a ∈ A(x)\{b}, it holds that

P[

¯Y b· > ¯Y a·, ∀ a ∈ A(x)\{b}]≥ 1− α

19

Algorithm 3 Procedure Nelson-Matejcik1. Choose confidence coefficient α (not the discount factor), tolerance δ, and initial sample size n0. Let

g = T(α)k−1,(k−1)(n0−1),1/2, an equicoordinate critical point of the equicorrelated multivariate central t-

distribution.

2. Generate an i.i.d. sample Y1, Y2, . . . , Yn0 .

3. Compute Ya· =∑n0j=1 Yaj/n0, Y·j =

∑ka=1 Yaj/k, and Y·· =

∑ka=1

∑n0j=1 Yaj/(kn0). Compute the sample

variance S2 of Yaj − Ybj (assuming sphericity), given by

S2 =2∑ka=1

∑n0j=1

(Yaj − Ya· − Y·j + Y··

)2

(k − 1)(n0 − 1)

4. Update the required sample size to n1 = max{n0, �(gS/δ)2�}.5. Generate n1 − n0 additional i.i.d. observations Yn0+1, . . . , Yn1 .

6. Compute the overall sample means ¯Y a· =∑n1j=1 Yaj/n1 for each a ∈ A(x).

7. Select the action a with the largest value of ¯Y a·.

In other words, the probability is at least 1 − α that an action a is selected with value µa that is within

tolerance δ of the best value µb.

Note that if variance reduction techniques are used to reduce E[S2] = Var[Yaj − Ybj ], then the required

sample size n1 = max{n0, �(gS/δ)2�} will on average be smaller for fixed values of α and δ, or conversely,

the confidence level 1−α can be increased and/or the tolerance δ can be decreased with a fixed sample size

n1. The following variance reduction techniques reduced the required sample size for the IRPDD with fixed

values of α and δ:

1. common random numbers,

2. stratified sampling,

3. latin hypercubes,

4. orthogonal arrays.

An example of the reduction in required sample size is given in Section 7.

The next computational issue that has to be addressed is the fact that there are a large number of

actions in A(x) to choose from if there are many vehicles and customers. If each vehicle can visit at most

one customer per day, then the number of actions k = |A(x)| =(M+NM

). Comparing all

(M+NM

)actions

requires too much computational effort for large values of M and N , and thus there is a need for a more

computationally efficient method. The greedy method given in Algorithm 4 produced optimal actions for all

the states of all the instances tested, as discussed in Section 7.

For ease of presentation, Procedure Greedy is stated here for the case where each vehicle can visit at most

one customer per day. The extension of Procedure Greedy to the case where vehicles can visit more than

one customer per day is straightforward.

20

Algorithm 4 Procedure Greedy1. Let x1 denote the current state (inventory level at each customer). Set m← 1.

2. Repeat steps 3 through 4 for each vehicle m = 1, . . . ,M .

3. Dispatch vehicle m to maximize the right hand side of the optimality equation (7). That is, choose thecustomer nm to send vehicle m to, and the quantity dm to deliver at customer nm, as follows.

(nm, dm) ∈ arg max(n,d)∈A(xm)

g(xm, (n, d)) + α

∑y∈X

Q[y | xm, (n, d)] V (y)

(18)

where A(xm) ≡ {(n, d) : n ∈ {0, 1, . . . , N}, d ∈ {0, 1, . . . , Cn − xmn }}. If nm = 0 it indicates that vehiclem is not dispatched to any customer (and C0 = 0). Algorithm 3 can be used to select the decision(nm, dm).

4. Update:

xm+1n =

{xmn + dm if n = nm

xmn if n �= nm(19)

Set m← m + 1.

7 Computational Results

To test the viability of our proposed dynamic programming approximation method for the IRPDD and to

fine-tune and improve its efficiency, we have conducted a variety of computational experiments.

7.1 Algorithm Efficiency

In the previous sections, we have proposed several algorithms to approximate the optimal value function

V ∗, the expected value, and an optimal action, in the right hand side of the optimality equation (2). In this

section, we test the efficiency of these algorithms and the quality of the solutions produced.

One type of approximation for the optimal value function V ∗ involves a parametric value function

V (x, β) ≡ β0 +∑Nn=1 βnV

∗n (xn, w∗

n(x)), where V ∗n is the optimal value function of the single customer

MDP for customer n and w∗(x) = (w∗1(x), . . . , w∗

N (x)) is an optimal solution of the nonlinear knapsack

problem (6). We have outlined two approaches for obtaining parameters β: the stochastic approximation

method and the Bellman error method.

The convergence rates of the stochastic approximation parameter estimates are affected significantly by

the rule for choosing the step sizes γt. We experimented with two different step size rules for stochastic

approximation. Rule 1 is γt = c1/(c2 + t), where c1 and c2 are chosen (typically large) constants. This is a

slight modification of γt = 1/t, the step size rule frequently given in the literature. Rule 2 is a variant of the

step size rule analyzed by Ruszczynski and Syski (1986), and is given by

γt = min {γ, γt−1 exp[min{η,−αut}]}

where γ0 > 0 and α, η, γ are chosen positive (typically small) constants. The quantity ut ≡ 〈ξt,∆βt〉, where

21

〈·, ·〉 denotes an inner product, ∆βt ≡ βt − βt−1, and ξt is a stochastic subgradient estimate of the convex

function that is to be minimized. For (12), ξt is given by the negative of the product of the temporal

difference and the eligibility vector, ξt = −dtzt. The convergence of the parameter estimates using each of

the two step size rules is shown in Figure 1. The figure shows that the parameter estimates converge much

faster with step size rule 2 than with step size rule 1. Observe too that in both cases the parameter estimates

first move away before moving towards their optimal values.

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 5 10 15 20 25 30 35 40

Par

amet

er V

alue

s

Time (hours)

Rule 2

Rule 1

Figure 1: Convergence behavior of the parameter estimates βt1 for stochastic approximation using twodifferent step size rules for instance opt12. For Rule 1, c1 = 109 and c2 = 1014. For Rule 2, α = 10−2,η = 10−2 and γ = 10−5.

An alternative to the stochastic approximation method is the Bellman error method. We have exper-

imented with both methods and found that the parameter estimates βt converged much faster with the

Bellman error method than with the stochastic approximation method, even when using step size rule 2

discussed above. An example of this behavior is presented in Figure 2. Note that the parameter estimates

were initialized close to their optimal values for the stochastic approximation method, and in spite of that the

parameter estimates converged much quicker with the Bellman error method, for which parameter estimates

are not initialized.

Also, to improve numerical behavior, the data in all the instances used in this computational study were

scaled appropriately so that the components of the gradient ∇β V (x, β) had the same order of magnitude.

Specifically, after the optimal value function V ∗n has been computed for each individual customer MDP,

the average values θn ≡∑

(xn,mn)∈XnV ∗n (xn,mn)/|Xn| are determined. The value function is rewritten

V (x, β) = β0 +∑Nn=1 βnθnV

∗n (xn, w∗

n(x))/θn =∑Nn=0 βnφn(x) ≡ V (x, β), where θ0 ≡ 1, βn ≡ βnθn,

φ0(x) ≡ 1, and φn(x) ≡ V ∗n (xn, w∗

n(x))/θn for n = 1, . . . , N . Thus the components of the gradient∇β V (x, β)

have the same order of magnitude, because ∂V (x, β)/∂βk = φk(x) ≈ 1 ≈ φl(x) = ∂V (x, β)/∂βl. The scaling

22

0.5

1

1.5

2

2.5

3

3.5

0 4 8 12 16 20 24 28

Par

amet

er V

alue

s

Time (hours)

Bellman Error Method

StochasticApproximation

Figure 2: Convergence behavior of the parameter estimates βt1 for the Bellman error method and for thestochastic approximation method for instance opt12. For the stochastic approximation method, Rule 2 wasused with α = 10−2, η = 10−2 and γ = 10−5.

causes the changes βt+1− βt = γtdtzt in the parameter estimates from one iteration to the next to be of the

same order of magnitude for the different parameters.

Another important aspect of our approach is the use of random sampling to estimate the expected value

in the optimality equation (2). Variance reduction techniques played a significant role in improving the

efficiency of these estimates. Variance reduction techniques can be used to either improve, for a given

sample size, the accuracy of the random estimators ¯Y a· of the value µa of action a on the right hand side

of (7) and of the estimators ( ¯Y a· − ¯Y b·) of µa − µb, or to decrease the sample size needed to obtain the

specified level of accuracy. After experimentation with common random numbers, stratified sampling, latin

hypercubes, and orthogonal arrays, we have chosen to use a combination of common random numbers and

orthogonal arrays, as it gave the best performance. The combination of common random numbers and

orthogonal arrays gave almost a ten-fold reduction in the sample size required for the specified accuracy

compared with just using simple random sampling with common random numbers. Even when one takes

into account that a combination of common random numbers and orthogonal arrays requires approximately

1.4 times as much computation time as simple random sampling with common random numbers for the same

sample size, it still provides a significant reduction in computational effort. The performance improvement is

illustrated in Figure 3, which shows the sample sizes required for the specified accuracy of choosing an action

with objective value within δ = 0.05 (approximately 0.1%) of the optimal objective value with probability at

least 1−α = 0.99, for each of 1000 transitions of a simulation of the IRPDD process, for both simple random

sampling with common random numbers and for random sampling with a combination of common random

numbers and the Bose-Bush orthogonal array design (Bose 1938 and Bose and Bush 1952) with level 9 and

23

frequency 3.

0

400

800

1200

1600

2000

0 100 200 300 400 500 600 700 800 900

Num

ber

Of

Obs

erva

tion

s R

equi

red

Simulation Steps

Figure 3: Sample sizes required for the specified accuracy with δ = 0.05 and α = 0.01, for each of 1000transitions of a simulation of instance cst2, for simple random sampling and orthogonal array sampling. Thenumber of observations required by simple random sampling are indicated by the thin line, and the numberof observations required by orthogonal array sampling are indicated by the thick line.

7.2 Solution Quality

In this section, we discuss a number of experiments to test the quality of the policies produced by the

dynamic programming approximation method.

First, we compare the value functions of the approximation policies with the optimal value functions for

small instances of the IRPDD, for which the optimal value function can be computed in reasonable time. The

ten instances used (given in Appendix B) have two, three, four or five customers, and demand distributions

which are either bimodal, or randomly generated, or uniform over all demand levels.

A concise presentation of the quality of a policy π is difficult because it involves a comparison of its value

function V π with the optimal value function V ∗ over all states x. We have chosen to present the quality of

the various value functions in several ways. For any value function V : X �→ IR, let Vavg ≡∑x∈X V (x)/|X |

denote the average value of the value function over all states. Because Vavg does not reveal the values at

good or bad states, we also present the minimum and maximum values of the value functions over all states,

that is, Vmin ≡ minx∈X V (x) and Vmax ≡ maxx∈X V (x).

In Table 3, we compare V πavg, V πmin, and V πmax for several policies π with V ∗avg, V ∗

min, and V ∗max. Policies

π′i result from Algorithm 2, where both the maximization and the expected value on the right hand side

of (7) are computed using enumeration, and where the sequence of policies π′0, π

′1, π

′2 result from successive

24

iterations of Algorithm 2. Policies πi also result from Algorithm 2, but the maximization and the expected

value on the right hand side of (7) are computed using a combination of Algorithm 4 and Algorithm 3,

and where the sequence of policies π0, π1, π2 also result from successive iterations of Algorithm 2. The

initial policies π′0 and π0 are myopic policies that use value function approximation V = 0 (or equivalently

discount factor α = 0) on the right hand side of (7). The differences between π′0 and π0 are that π′

0 is based

on computing the expected shortage penalty in g(x, a) using enumeration and on comparing the values

g(x, a) of all actions a ∈ A(x) and then choosing the best action, whereas π0 is based on a combination of

Algorithm 3 and Algorithm 4 for computing the expected shortage penalty and choosing an action. The

Gauss-Seidel policy evaluation algorithm was used to compute the value function of each policy.

The results show that the values of the policies produced by the dynamic programming approximation

method are very close to the optimal values. Furthermore, it shows that the policies obtained after successive

iterations of Algorithm 2 are slightly better than the preceding policies.

In Table 4, we present the same results in a different way. Instead of presenting summary information

of the value functions, we present summary information of the value functions of the dynamic programming

approximation policies relative to the optimal value function. However, there are several problems with

interpreting ratios such as V π(x)/V ∗(x) or [V ∗(x) − V π(x)]/V ∗(x). For the IRP, V ∗(x) can be positive or

negative, which could make the ratios above not very meaningful. A ratio such as [V ∗(x) − V π(x)]/|V ∗(x)|is also not without problems, since the denominator can be arbitrarily close to 0. Also, all the ratios above

can be made to appear arbitrarily good by adding a sufficiently large constant revenue in each time period,

independent of the state or decision. In an attempt to overcome some of these shortcomings, we shift the

values to fix the minimum value of the shifted optimal value function at 1. Specifically, let m ≡ minx∈XV ∗(x),

and for any stationary policy π, let ρπ(x) ≡ [V π(x)−m+1]/[V ∗(x)−m+1]. Then, let ρπavg ≡∑x∈X ρπ(x)/|X |,

ρπmin ≡ minx∈X ρπ(x), and ρπmax ≡ maxx∈X ρπ(x) denote the average, minimum, and maximum, over all

states, of the performance ratio ρπ(x) for policy π. Table 4 also shows that the values of the policies

produced by the dynamic programming approximation method are very close to the optimal values.

Another way to present the value function of a policy π is to graph the value functions V π(x) and V ∗(x)

for a subset of the states x. Figure 4 shows the value function V π1(x) of the approximation policy π1 that is

obtained after 104 iterations of the Bellman error method for parameter estimation, as well as the optimal

value function V ∗(x), for instance opt11 with 3 customers, with the inventory level at customer 1 fixed at

x1 = 10, as a function of the inventory level x2 at customer 2, for three levels of inventory at customer 3

(x3 = 0, x3 = 5, and x3 = 10). The figure shows the quality of the resulting policy as its value function is

close to the optimal value function.

Since computing the optimal value functions for large instances of the IRPDD is too time consuming, we

compare the quality of the approximation policies with the quality of two other policies for large instances.

The objective is to evaluate the quality of the approximation policies for larger instances, and to evaluate the

improvements obtained by using parameterized value function approximations. The first of these policies

is based on the method proposed by Chien, Balakrishnan and Wong (1989) (denoted by CBW). They

formulated an integer programming based single-day model, in which problem parameters are adjusted from

one day to the next. We slightly modified the CBW method to take the revenues and costs of our model

25

Table 3: Comparison of the optimal values with the values of the approximation policies.

π∗ π′0 π′

1 π′2 π0 π1 π2

Instance N V ∗min V ∗

avg V ∗max V

π′0

min Vπ′0

avg Vπ′0

max Vπ′1

min Vπ′1

avg Vπ′1

max Vπ′2

min Vπ′2

avg Vπ′2

max V π0min V π0

avg V π0max V π1

min V π1avg V π1

max V π2min V π2

avg V π2max

opt1 2 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95opt2 2 48.11 49.11 49.56 48.00 49.10 49.56 48.11 49.11 49.56 48.11 49.11 49.56 48.00 49.10 49.56 48.11 49.11 49.56 48.11 49.11 49.56

opt3 3 37.26 37.88 38.38 37.16 37.87 38.25 37.18 37.88 38.29 37.22 37.88 38.34 37.14 37.87 38.21 37.17 37.88 38.26 37.19 37.88 38.32opt4 3 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58opt5 3 87.52 89.49 90.95 87.39 89.49 90.80 87.41 89.49 90.84 87.46 89.49 90.89 87.36 89.49 90.78 87.39 89.49 90.82 87.45 89.49 90.87

opt6 4 18.28 18.63 18.99 18.20 18.61 18.90 18.24 18.63 18.92 18.27 18.63 18.96 18.17 18.61 18.87 18.18 18.63 18.89 18.23 18.63 18.93opt7 4 54.92 55.78 56.78 54.82 55.76 56.64 54.88 55.78 56.69 54.90 55.78 56.74 54.81 55.76 56.62 54.83 55.78 56.69 54.87 55.78 56.73opt8 4 41.82 42.71 43.52 41.76 42.71 43.49 41.76 42.71 43.51 41.79 42.71 43.52 41.73 42.71 43.46 41.74 42.71 43.47 41.77 42.71 43.50

opt9 5 25.70 26.16 26.43 25.65 26.15 26.38 25.66 26.16 26.40 25.66 26.16 26.43 25.62 26.15 26.32 25.64 26.15 26.35 25.65 26.16 26.43opt10 5 37.82 38.56 39.41 37.82 38.56 39.34 37.82 38.56 39.41 37.82 38.56 39.41 37.82 38.56 39.33 37.82 38.56 39.41 37.82 38.56 39.41

26

Table 4: Comparison of the values of the approximation policies relative to the optimal values.π′

0 π′1 π′

2 π0 π1 π2

Instance N ρπ′0

min ρπ′0

avg ρπ′0

max ρπ′1

min ρπ′1

avg ρπ′1

max ρπ′2

min ρπ′2

avg ρπ′2

max ρπ0min ρπ0

avg ρπ0max ρπ1

min ρπ1avg ρπ1

max ρπ2min ρπ2

avg ρπ2max

opt1 2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000opt2 2 0.997 0.999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.998 1.000 1.000 1.000 1.000 1.000 1.000 1.000

opt3 3 0.991 0.994 0.996 0.994 0.997 0.999 0.999 0.999 1.000 0.991 0.994 0.996 0.993 0.996 0.999 0.999 0.999 1.000opt4 3 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000opt5 3 0.983 0.990 0.995 0.995 0.997 0.999 0.999 0.999 1.000 0.983 0.990 0.995 0.995 0.997 0.999 0.999 0.999 1.000

opt6 4 0.986 0.992 0.996 0.996 0.997 0.999 0.999 0.999 1.000 0.986 0.992 0.996 0.996 0.997 0.999 0.999 0.999 1.000opt7 4 0.981 0.990 0.996 0.991 0.993 0.998 0.992 0.994 0.999 0.980 0.988 0.996 0.991 0.993 0.998 0.992 0.994 0.999opt8 4 0.992 0.994 0.998 0.998 0.999 1.000 0.998 0.999 1.000 0.992 0.994 0.998 0.998 0.999 1.000 0.998 0.999 1.000

opt9 5 0.985 0.992 0.996 0.995 0.997 0.998 0.999 0.999 1.000 0.980 0.985 0.990 0.981 0.987 0.992 0.985 0.990 0.994opt10 5 0.994 0.996 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.994 0.996 1.000 1.000 1.000 1.000 1.000 1.000 1.000

27

12.3

12.4

12.5

12.6

12.7

12.8

0 1 2 3 4 5 6 7 8 9 10

Val

ue f

unct

ion

Inventory at customer 2 (x2)

x3 = 0

x3 = 5

x3 = 10

x1 = 10

Figure 4: Comparison of the values V ∗(x) and V π1(x) at different states x for instance opt11. The optimalvalues V ∗(x) are shown by the solid lines, and the values V π1(x) of the KNS policy π1 are shown by thedashed lines.

into account. An integer program is formulated that maximizes the daily profit, which consists of revenue

per unit delivered, transportation costs, inventory holding costs, and shortage costs. The integer program

determines an assignment of vehicles to customers for each day. As the process evolves, data are collected

and are used to modify the rewards and costs for the next day. Unsatisfied demand at a customer in one

day causes an increased reward per unit delivered for that customer the next day. The integer program

that forms the basis of the CBW approach is given in Appendix A. The second policy which is used for

comparison is the myopic policy π0 described at the beginning of Section 7.2, that is based on a combination

of Algorithm 3 and Algorithm 4 to compute the expected shortage penalty and to choose an action.

We also compare two variants of our policy. The first variant is the policy introduced in Section 5.1 and

specifically given in (7), which uses the decomposition approximation given in (6). The second variant is

the policy introduced in Section 5.3, which uses a combination of the decomposition approximation and a

parametric approximation given in (16). The first variant can be considered a special case of the second

variant with parameters β0 = 0 and βn = 1 for all other n (we denote the first variant by KNS (before

simulation)). The second variant was obtained after two policy improvement iterations. During each policy

evaluation phase parameters were estimated by simulating the IRPDD process for 104 steps using the Bellman

error method followed by 108 steps of stochastic approximation (we denote the second variant by KNS (after

simulation)). In both these variants, a combination of random sampling with common random numbers and

the Bose-Bush orthogonal array design with level 9 and frequency 3, and a combination of Algorithm 3 and

Algorithm 4 were used to choose the decision in each time period.

28

The Gauss-Seidel policy evaluation algorithm used to compute the value functions of policies for smaller

instances cannot be used for larger instances. The main reason for this is that the number of states becomes

too large, and hence the available computer memory is not sufficient to store the values of all the states,

and the computation time becomes excessive. For larger instances, the policies were evaluated by randomly

choosing five initial states, and then simulating the IRPDD process under the different policies starting from

the chosen initial states. Each replication produced a sample path over a relatively long but finite time

horizon of 800 time periods. The length of the time horizon was chosen to bound the discounted truncation

error to less than 0.01 (less than 0.1%). Six sample paths were generated for each combination of policy

and initial state, for each problem instance. The sample means µ and standard deviations σ of the sample

means over the six sample paths, as well as intervals (µ− 2σ, µ + 2σ) were computed.

We conducted three experiments to evaluate the quality of the four policies on larger instances. In each of

these experiments, we varied a single instance characteristic and observed the impact on the performance of

the policies. The three instance characteristics are (1) the number of customers, (2) the number of vehicles,

and (3) the coefficient of variation of customer demand.

To study the impact of the number of customers on the performance of the policies, the instances were

generated so that larger instances have more customers with the same characteristics as the smaller instances.

Hence, customer characteristics as well as the ratio of delivery capacity to total expected demand were kept

the same for all instances. Table 5 shows the performance of the policies on instances with different numbers

of customers.

The results clearly demonstrate that the KNS policies consistently outperform the other policies. Fur-

thermore, the difference in quality appears to increase with the number of customers. Apparently, when the

number of customers becomes larger, the KNS policies are better at coordinating deliveries than the other

policies. Also, observe that while KNS (before simulation) gives good results, the results from KNS (after

simulation) are better.

Next, we studied the impact of the number of vehicles, and thus the delivery capacity available, on the

performance of the policies. The numbers of vehicles were chosen in such a way that we could study the

effectiveness of the policies when the available delivery capacity is smaller than the total expected demand,

as well as when there is surplus delivery capacity. The results are given in Table 6.

Intuitively, it is clear that when the delivery capacity is very restrictive, i.e., the number of vehicles is

small, then it is more important to use the available capacity wisely. The results show the superiority of the

KNS policies in handling these situations. The differences in quality are much larger for tightly constrained

instances than for loosely constrained instances.

Finally, we studied the impact of the customer demand coefficient of variation on the performance of

the policies. The customer demand distributions for the six instances were selected so that the demand

distribution was the same for all customers in an instance, and the expected customer demand for each of

the instances was 4.5. We varied the distributions so that the customer demands had different variances,

namely 0.25, 4.65, 8.85, 12.85, 15.25 and 17.05. All other characteristics were exactly the same for the

instances. The results are given in Table 7.

The results show that when the coefficients of variation of customer demands are large and it becomes

29

Table 5: Performance of policies on instances with different numbers of customers.

CBW (1989) Myopic KNS (before simulation) KNS (after simulation)Instance N µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ

cst1 10 -6.92 0.11 -7.13 -6.70 -6.17 0.04 -6.24 -6.09 -4.31 0.07 -4.45 -4.18 -3.53 0.16 -3.84 -3.21-7.16 0.12 -7.39 -6.92 -6.36 0.05 -6.47 -6.26 -4.48 0.08 -4.65 -4.32 -3.49 0.18 -3.85 -3.12-6.97 0.11 -7.20 -6.74 -6.24 0.05 -6.34 -6.13 -4.54 0.08 -4.69 -4.38 -3.42 0.12 -3.67 -3.18-6.98 0.11 -7.20 -6.76 -6.11 0.05 -6.22 -6.00 -4.24 0.08 -4.41 -4.07 -3.50 0.09 -3.68 -3.32-7.19 0.09 -7.37 -7.01 -6.37 0.05 -6.46 -6.27 -4.55 0.06 -4.66 -4.43 -3.70 0.17 -4.04 -3.36

cst2 20 -13.35 0.10 -13.55 -13.15 -12.30 0.10 -12.50 -12.11 -9.10 0.24 -9.57 -8.63 -7.38 0.16 -7.69 -7.06-12.99 0.09 -13.17 -12.81 -12.41 0.14 -12.69 -12.13 -9.09 0.16 -9.41 -8.77 -7.37 0.16 -7.70 -7.05-13.46 0.16 -13.79 -13.14 -13.00 0.10 -13.20 -12.79 -9.64 0.07 -9.78 -9.49 -7.49 0.05 -7.58 -7.39-13.20 0.06 -13.33 -13.07 -12.55 0.08 -12.70 -12.39 -9.16 0.12 -9.40 -8.93 -7.65 0.11 -7.86 -7.43-13.23 0.12 -13.48 -12.98 -12.87 0.16 -13.19 -12.56 -9.40 0.16 -9.72 -9.08 -7.33 0.14 -7.62 -7.04

cst3 30 -19.45 0.09 -19.63 -19.26 -18.91 0.15 -19.21 -18.61 -13.91 0.15 -14.20 -13.62 -11.20 0.14 -11.48 -10.92-19.43 0.06 -19.56 -19.31 -18.64 0.13 -18.89 -18.38 -13.99 0.16 -14.30 -13.67 -10.99 0.31 -11.62 -10.36-19.73 0.23 -20.20 -19.26 -19.25 0.29 -19.83 -18.67 -13.80 0.26 -14.31 -13.29 -11.31 0.17 -11.65 -10.96-19.32 0.08 -19.47 -19.17 -18.47 0.12 -18.70 -18.23 -13.63 0.20 -14.02 -13.23 -11.28 0.30 -11.88 -10.67-19.63 0.05 -19.73 -19.52 -19.37 0.20 -19.77 -18.97 -14.37 0.14 -14.65 -14.09 -11.34 0.21 -11.76 -10.92

cst4 40 -25.30 0.08 -25.47 -25.14 -24.18 0.23 -24.65 -23.71 -17.34 0.20 -17.74 -16.94 -12.77 0.26 -13.30 -12.24-25.54 0.12 -25.79 -25.29 -24.52 0.26 -25.04 -24.00 -16.75 0.25 -17.25 -16.26 -12.43 0.11 -12.64 -12.22-25.60 0.07 -25.74 -25.46 -24.69 0.22 -25.12 -24.25 -17.09 0.30 -17.68 -16.50 -13.07 0.15 -13.36 -12.77-25.34 0.06 -25.47 -25.22 -24.73 0.16 -25.04 -24.41 -17.42 0.20 -17.82 -17.02 -12.95 0.18 -13.31 -12.58-25.35 0.07 -25.49 -25.21 -24.31 0.21 -24.73 -23.90 -16.91 0.23 -17.36 -16.45 -12.77 0.20 -13.17 -12.38

cst5 50 -31.55 0.22 -31.99 -31.12 -31.38 0.33 -32.05 -30.71 -23.12 0.42 -23.95 -22.28 -18.59 0.33 -19.26 -17.92-31.70 0.11 -31.92 -31.48 -30.64 0.24 -31.11 -30.16 -23.65 0.28 -24.22 -23.08 -18.83 0.41 -19.65 -18.00-31.65 0.13 -31.91 -31.38 -30.77 0.28 -31.33 -30.20 -23.23 0.34 -23.92 -22.54 -18.22 0.26 -18.75 -17.69-31.60 0.21 -32.02 -31.18 -31.20 0.35 -31.91 -30.49 -23.19 0.40 -23.99 -22.39 -18.31 0.27 -18.85 -17.77-31.78 0.12 -32.01 -31.55 -30.78 0.21 -31.20 -30.37 -23.84 0.47 -24.77 -22.91 -19.11 0.33 -19.77 -18.45

cst6 60 -37.21 0.19 -37.58 -36.83 -35.92 0.29 -36.50 -35.35 -26.79 0.26 -27.31 -26.28 -21.98 0.33 -22.63 -21.32-37.08 0.26 -37.59 -36.56 -35.72 0.26 -36.25 -35.20 -26.59 0.22 -27.04 -26.14 -21.34 0.33 -22.01 -20.67-37.84 0.19 -38.21 -37.47 -37.40 0.58 -38.55 -36.25 -26.91 0.18 -27.26 -26.55 -21.69 0.33 -22.35 -21.02-37.47 0.32 -38.10 -36.84 -36.03 0.38 -36.79 -35.26 -27.05 0.19 -27.42 -26.68 -21.49 0.49 -22.47 -20.50-37.24 0.20 -37.63 -36.84 -35.87 0.29 -36.45 -35.29 -27.29 0.26 -27.82 -26.76 -21.24 0.45 -22.15 -20.33

30

Table 6: Performance of policies on instances with different numbers of vehicles.

CBW (1989) Myopic KNS (before simulation) KNS (after simulation)Instance M µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ

veh1 6 -91.69 0.27 -92.22 -91.16 -89.20 0.26 -89.73 -88.67 -74.17 0.21 -74.58 -73.76 -68.83 0.28 -69.38 -68.28-92.08 0.23 -92.54 -91.62 -90.15 0.22 -90.58 -89.71 -74.15 0.38 -74.92 -73.39 -68.97 0.31 -69.59 -68.35-91.43 0.29 -92.01 -90.85 -90.49 0.20 -90.90 -90.08 -74.32 0.33 -74.99 -73.65 -68.72 0.31 -69.33 -68.10-90.89 0.23 -91.36 -90.43 -89.58 0.12 -89.83 -89.34 -74.53 0.35 -75.24 -73.83 -68.54 0.46 -69.47 -67.62-91.16 0.32 -91.79 -90.52 -90.36 0.14 -90.64 -90.08 -74.32 0.40 -75.12 -73.52 -68.52 0.35 -69.23 -67.82

veh2 8 -57.70 0.22 -58.13 -57.27 -56.00 0.16 -56.32 -55.69 -44.38 0.22 -44.82 -43.94 -40.70 0.20 -41.11 -40.29-58.32 0.21 -58.74 -57.89 -56.17 0.12 -56.41 -55.92 -44.82 0.24 -45.31 -44.34 -40.61 0.45 -41.52 -39.71-57.96 0.29 -58.55 -57.37 -56.18 0.08 -56.35 -56.01 -44.23 0.30 -44.83 -43.63 -40.64 0.34 -41.32 -39.96-57.96 0.21 -58.39 -57.53 -56.28 0.17 -56.61 -55.94 -43.91 0.30 -44.50 -43.31 -40.36 0.40 -41.15 -39.56-57.50 0.20 -57.90 -57.10 -56.13 0.14 -56.41 -55.86 -44.36 0.20 -44.75 -43.96 -41.43 0.24 -41.91 -40.94

veh3 10 -43.78 0.34 -44.45 -43.11 -42.29 0.09 -42.47 -42.10 -32.79 0.25 -33.30 -32.29 -28.94 0.35 -29.64 -28.24-43.50 0.21 -43.92 -43.08 -42.53 0.07 -42.68 -42.38 -32.94 0.36 -33.66 -32.23 -28.23 0.47 -29.16 -27.29-44.22 0.23 -44.67 -43.77 -42.42 0.17 -42.77 -42.08 -32.60 0.25 -33.10 -32.10 -28.45 0.33 -29.10 -27.80-43.80 0.38 -44.56 -43.04 -42.51 0.10 -42.70 -42.32 -32.34 0.29 -32.92 -31.76 -28.77 0.28 -29.33 -28.21-43.88 0.28 -44.43 -43.32 -41.63 0.07 -41.76 -41.50 -32.44 0.27 -32.98 -31.91 -28.31 0.29 -28.88 -27.73

veh4 12 -23.99 0.34 -24.67 -23.32 -22.69 0.12 -22.92 -22.45 -12.40 0.44 -13.27 -11.52 -7.77 0.23 -8.22 -7.32-24.03 0.22 -24.46 -23.59 -23.37 0.09 -23.55 -23.20 -12.48 0.26 -12.99 -11.96 -7.87 0.41 -8.68 -7.06-23.84 0.31 -24.45 -23.23 -22.62 0.11 -22.85 -22.40 -11.94 0.27 -12.49 -11.40 -7.55 0.38 -8.30 -6.80-23.95 0.25 -24.44 -23.46 -23.21 0.15 -23.51 -22.92 -12.74 0.16 -13.05 -12.42 -7.71 0.42 -8.55 -6.86-23.66 0.17 -24.00 -23.32 -22.62 0.08 -22.78 -22.46 -11.98 0.38 -12.75 -11.22 -6.92 0.28 -7.47 -6.37

veh5 14 -3.98 0.15 -4.27 -3.68 -3.33 0.08 -3.49 -3.17 -1.80 0.10 -1.99 -1.61 0.65 0.10 0.45 0.85-3.51 0.34 -4.19 -2.83 -3.41 0.11 -3.63 -3.18 -1.90 0.22 -2.35 -1.46 0.83 0.08 0.68 0.98-3.41 0.22 -3.85 -2.98 -3.44 0.09 -3.61 -3.27 -1.54 0.25 -2.05 -1.03 0.63 0.11 0.41 0.84-3.71 0.28 -4.27 -3.16 -3.69 0.07 -3.84 -3.54 -2.39 0.20 -2.80 -1.98 0.40 0.07 0.26 0.54-4.03 0.22 -4.47 -3.60 -3.18 0.13 -3.44 -2.93 -2.11 0.15 -2.40 -1.81 0.60 0.10 0.40 0.79

veh6 16 -0.95 0.34 -1.63 -0.28 -0.81 0.05 -0.92 -0.71 2.42 0.27 1.88 2.96 3.44 0.18 3.08 3.79-1.13 0.15 -1.44 -0.83 -1.00 0.04 -1.09 -0.91 2.40 0.19 2.02 2.78 3.48 0.07 3.34 3.61-1.28 0.17 -1.61 -0.95 -0.72 0.04 -0.80 -0.65 2.50 0.35 1.80 3.20 3.45 0.09 3.27 3.63-1.30 0.22 -1.74 -0.86 -1.16 0.07 -1.31 -1.02 2.65 0.24 2.18 3.13 3.56 0.13 3.29 3.83-1.26 0.24 -1.74 -0.78 -0.69 0.07 -0.83 -0.56 2.27 0.31 1.64 2.89 3.46 0.11 3.24 3.67

31

Table 7: Performance of policies on instances with different coefficients of variation.

CBW (1989) Myopic KNS (before simulation) KNS (after simulation)Instance CV µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ

var1 0.11 -11.71 0.06 -11.84 -11.59 -11.03 0.05 -11.13 -10.92 -9.00 0.03 -9.06 -8.95 -8.24 0.09 -8.41 -8.07-11.68 0.05 -11.77 -11.58 -10.97 0.05 -11.08 -10.87 -8.96 0.03 -9.02 -8.89 -8.12 0.10 -8.32 -7.91-11.76 0.06 -11.88 -11.63 -10.83 0.07 -10.97 -10.70 -8.81 0.01 -8.84 -8.78 -8.24 0.09 -8.41 -8.06-11.56 0.08 -11.71 -11.40 -10.78 0.05 -10.88 -10.67 -8.70 0.03 -8.77 -8.64 -8.26 0.06 -8.39 -8.13-11.43 0.04 -11.51 -11.35 -10.63 0.07 -10.77 -10.48 -8.61 0.03 -8.67 -8.55 -8.45 0.05 -8.56 -8.35

var2 0.48 -18.78 0.14 -19.06 -18.50 -17.88 0.11 -18.09 -17.67 -15.99 0.24 -16.47 -15.50 -15.13 0.02 -15.17 -15.09-18.59 0.17 -18.92 -18.26 -17.77 0.14 -18.06 -17.49 -15.90 0.27 -16.45 -15.35 -15.08 0.04 -15.16 -15.00-18.74 0.20 -19.13 -18.35 -17.88 0.16 -18.20 -17.57 -16.03 0.29 -16.60 -15.45 -15.10 0.06 -15.21 -14.98-18.70 0.12 -18.95 -18.45 -17.75 0.13 -18.01 -17.50 -15.88 0.36 -16.61 -15.16 -15.11 0.05 -15.22 -15.00-18.37 0.22 -18.81 -17.93 -17.71 0.13 -17.96 -17.45 -15.85 0.24 -16.33 -15.37 -15.13 0.03 -15.19 -15.07

var3 0.66 -23.11 0.27 -23.65 -22.57 -22.91 0.27 -23.45 -22.37 -21.23 0.14 -21.51 -20.96 -20.55 0.05 -20.65 -20.44-23.23 0.19 -23.60 -22.85 -23.16 0.17 -23.50 -22.81 -21.50 0.26 -22.02 -20.97 -20.41 0.07 -20.55 -20.27-23.25 0.26 -23.76 -22.73 -23.16 0.17 -23.50 -22.82 -21.45 0.21 -21.86 -21.03 -20.37 0.06 -20.49 -20.25-22.94 0.29 -23.51 -22.36 -22.80 0.17 -23.15 -22.45 -21.12 0.27 -21.65 -20.59 -20.35 0.05 -20.44 -20.26-22.95 0.23 -23.42 -22.48 -22.78 0.17 -23.11 -22.44 -21.11 0.26 -21.62 -20.59 -20.51 0.06 -20.63 -20.39

var4 0.80 -22.57 0.41 -23.38 -21.75 -22.47 0.35 -23.18 -21.77 -21.17 0.18 -21.53 -20.82 -20.58 0.06 -20.69 -20.46-22.94 0.45 -23.83 -22.05 -22.63 0.34 -23.31 -21.94 -21.32 0.31 -21.95 -20.69 -20.72 0.05 -20.82 -20.62-23.03 0.43 -23.89 -22.17 -22.77 0.37 -23.51 -22.03 -21.41 0.12 -21.64 -21.17 -20.70 0.03 -20.77 -20.63-22.32 0.38 -23.08 -21.57 -22.19 0.33 -22.85 -21.53 -20.81 0.37 -21.56 -20.06 -20.59 0.06 -20.70 -20.47-22.44 0.41 -23.25 -21.63 -22.17 0.35 -22.88 -21.47 -20.85 0.17 -21.19 -20.52 -20.48 0.01 -20.51 -20.46

var5 0.87 -22.75 0.33 -23.41 -22.08 -22.63 0.28 -23.20 -22.07 -21.65 0.26 -22.16 -21.13 -21.28 0.05 -21.38 -21.18-22.61 0.25 -23.11 -22.10 -22.51 0.14 -22.80 -22.23 -21.53 0.28 -22.09 -20.98 -21.28 0.05 -21.39 -21.17-22.57 0.31 -23.20 -21.95 -22.55 0.20 -22.96 -22.15 -21.57 0.21 -21.98 -21.16 -21.38 0.04 -21.47 -21.30-22.72 0.30 -23.33 -22.12 -22.24 0.16 -22.56 -21.93 -21.22 0.18 -21.57 -20.87 -21.26 0.04 -21.35 -21.17-22.39 0.39 -23.16 -21.62 -22.13 0.19 -22.51 -21.76 -21.15 0.19 -21.53 -20.76 -21.41 0.05 -21.52 -21.30

var6 0.92 -21.50 0.27 -22.03 -20.97 -21.13 0.22 -21.57 -20.68 -20.40 0.31 -21.03 -19.77 -20.03 0.04 -20.10 -19.95-21.94 0.25 -22.44 -21.44 -21.35 0.19 -21.72 -20.98 -20.60 0.24 -21.08 -20.13 -20.01 0.06 -20.13 -19.89-21.89 0.33 -22.56 -21.23 -21.53 0.23 -22.00 -21.06 -20.75 0.31 -21.38 -20.12 -19.97 0.04 -20.05 -19.89-21.24 0.35 -21.95 -20.53 -20.92 0.24 -21.40 -20.45 -20.20 0.32 -20.85 -19.56 -19.97 0.04 -20.06 -19.89-21.27 0.32 -21.92 -20.62 -20.97 0.23 -21.43 -20.51 -20.27 0.34 -20.96 -19.58 -19.94 0.03 -20.01 -19.87

32

less clear what the future is going to bring, then the differences in quality between the KNS policies and the

other policies tend to be smaller, although the KNS policies still do better on every instance. As expected,

this indicates that carefully taking into account the available information about the future, such as through

dynamic programming approximation methods, provides more benefit if the available information about the

future is more accurate.

Overall, the computational experiments conducted demonstrate the viability of using dynamic program-

ming approximation methods for the IRPDD.

8 Further Work

An important extension of our work involves routing vehicles to more than one customer on a delivery route.

This version of the IRP is much harder than the IRPDD, since the optimization problem on the right hand

side of (2) is much harder for the IRP than for the IRPDD. In the case of the IRP, this optimization problem

involves solving both a vehicle routing problem, which is NP-hard, as well as determining the optimal

quantities to be delivered to each customer on a delivery route, which involves solving an optimization

problem with a nonunimodal objective function, as shown in Campbell et al. (1998). An approach for the

stochastic IRP in which more than one customer can be visited on a delivery route was proposed in Kleywegt,

Nori and Savelsbergh (2000).

Other issues that have to be addressed before IRPs can be solved in practice, include the estimation of the

problem parameters from data. These include the revenues and costs, as well as the demand distributions.

Estimating these parameters from noisy data lead to hard statistical and optimization problems. It is

surprising how little work has been done in this area, since it is clear that the estimation of problem

parameters from data is an essential activity for the formulation and solution of practical optimization

problems such as the IRP.

References

Anily, S. and Federgruen, A. 1990. One Warehouse Multiple Retailer Systems with Vehicle RoutingCosts. Management Science, 36, 92–114.

Anily, S. and Federgruen, A. 1993. Two-Echelon Distribution Systems with Vehicle Routing Costs andCentral Inventories. Operations Research, 41, 37–47.

Bard, J., Huang, L., Dror, M. and Jaillet, P. 1997. A Branch and Cut Algorithm for the VRP withSatellite Facilities. not published.

Barnes-Schuster, D. and Bassok, Y. 1997. Direct Shipping and the Dynamic Single-depot/Multi-retailer Inventory System. European Journal of Operational Research, 101, 509–518.

Bassok, Y. and Ernst, R. 1995. Dynamic Allocations for Multi-Product Distribution. TransportationScience, 29, 256–266.

Bell, W., Dalberto, L., Fisher, M., Greenfield, A., Jaikumar, R., Kedia, P., Mack, R. and

Prutzman, P. 1983. Improving the Distribution of Industrial Gases with an On-Line ComputerizedRouting and Scheduling Optimizer. Interfaces , 13, 4–23.

33

Benjamin, J. 1989. An Analysis of Inventory and Transportation Costs in a Constrained Network. Trans-portation Science, 23, 177–183.

Berman, O. and Larson, R. C. 1999. Deliveries in an Inventory/Routing Problem Using StochasticDynamic Programming, Technical report, Massachusetts Institute of Technology, Cambridge, MA.

Bertsekas, D. P. 1975. Convergence of Discretization Procedures in Dynamic Programming. IEEETransactions on Automatic Control , 20, 415–419.

Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA.

Bertsekas, D. P. and Shreve, S. E. 1978. Stochastic Optimal Control: The Discrete Time Case.Academic Press, New York, NY.

Bertsekas, D. P. and Tsitsiklis, J. N. 1996. Neuro-Dynamic Programming. Athena Scientific, NewYork, NY.

Blumenfeld, D. E., Burns, L. D. and Daganzo, C. F. 1991. Synchronizing Production and Trans-portation Schedules. Transportation Research, 25B, 23–37.

Blumenfeld, D. E., Burns, L. D., Diltz, J. D. and Daganzo, C. F. 1985. Analyzing Trade-offsbetween Transportation, Inventory and Production Costs on Freight Networks. Transportation Research,19B, 361–380.

Bose, R. C. 1938. On the Application of the Theory of Galois Fields to the Problem of Construction ofHyper-Graeco-Latin Squares. Sankhya, 3, 323–338.

Bose, R. C. and Bush, K. A. 1952. Orthogonal Arrays of Strength Two and Three. The Annals ofMathematical Statistics , 23, 508–524.

Burns, L. D., Hall, R. W., Blumenfeld, D. E. and Daganzo, C. F. 1985. Distribution Strategiesthat Minimize Transportation and Inventory Costs. Operations Research, 33, 469–490.

Campbell, A., Clarke, L., Kleywegt, A. J. and Savelsbergh, M. W. P. 1998. The InventoryRouting Problem. In Fleet Management and Logistics. T. G. Crainic and G. Laporte (editors). KluwerAcademic Publishers, Dordrecht, Netherlands, chapter 4.

Cetinkaya, S. and Lee, C. Y. 2000. Stock Replenishment and Shipment Scheduling for Vendor ManagedInventory Systems. Management Science, 46, 217–232.

Chandra, P. and Fisher, M. 1994. Coordination of Production and Distribution Planning. EuropeanJournal of Operational Research, 72, 503–517.

Chan, L. M. A., Federgruen, A. and Simchi-Levi, D. 1998. Probabilistic Analysis and PracticalAlgorithms for Inventory-Routing Models. Operations Research, 46, 96–106.

Chien, T. W. 1993. Determining Profit-Maximizing Production/Shipping Policies in a One-to-One DirectShipping, Stochastic Demand Environment. European Journal of Operational Research, 64, 83–102.

Chien, T. W., Balakrishnan, A. and Wong, R. T. 1989. An Integrated Inventory Allocation andVehicle Routing Problem. Transportation Science, 23, 67–76.

Chow, C. S. and Tsitsiklis, J. N. 1991. An Optimal One-Way Multigrid Algorithm for Discrete-TimeStochastic Control. IEEE Transactions on Automatic Control , AC-36, 898–914.

Christiansen, M. 1999. Decomposition of a Combined Inventory and Time Constrained Ship RoutingProblem. Transportation Science, 33, 3–16.

34

Christiansen, M. and Nygreen, B. 1998a. A Method for Solving Ship Routing Problems with InventoryConstraints. Annals of Operations Research, 81, 357–378.

Christiansen, M. and Nygreen, B. 1998b. Modelling Path Flows for a Combined Ship Routing andInventory Management Problem. Annals of Operations Research, 82, 391–412.

Cohen, M. A. and Lee, H. L. 1988. Strategic Analysis of Integrated Production-Distribution Systems:Models and Methods. Operations Research, 36, 216–228.

Dror, M. and Ball, M. 1987. Inventory/Routing: Reduction from an Annual to a Short Period Problem.Naval Research Logistics Quarterly, 34, 891–905.

Dror, M., Ball, M. and Golden, B. 1985. A Computational Comparison of Algorithms for the InventoryRouting Problem. Annals of Operations Research, 4, 3–23.

Dror, M. and Trudeau, P. 1996. Cash Flow Optimization in Delivery Scheduling. European Journal ofOperational Research, 88, 504–515.

Federgruen, A. and Simchi-Levi, D. 1995. Analysis of Vehicle Routing and Inventory-Routing Problems.In Network Routing. M. O. Ball, T. L. Magnanti, C. L. Monma and G. L. Nemhauser (editors). Vol. 8 ofHandbooks in Operations Research and Management Science, North-Holland, Amsterdam, Netherlands,chapter 4, 297–373.

Federgruen, A. and Zipkin, P. 1984. A Combined Vehicle Routing and Inventory Allocation Problem.Operations Research, 32, 1019–1037.

Fumero, F. and Vercellis, C. 1999. Synchronized Development of Production, Inventory, and Distrib-ution Schedules. Transportation Science, 33, 330–340.

Gallego, G. and Simchi-Levi, D. 1990. On the Effectiveness of Direct Shipping Strategy for the One-Warehouse Multi-Retailer R-Systems. Management Science, 36, 240–243.

Golden, B., Assad, A. and Dahl, R. 1984. Analysis of a Large Scale Vehicle Routing Problem with anInventory Component. Large Scale Systems , 7, 181–190.

Haimovich, M. and Rinnooy Kan, A. H. G. 1985. Bounds and heuristics for capacitated routingproblems. Mathematics of Operations Research, 10, 527–542.

Jaillet, P., Huang, L., Bard, J. and Dror, M. 1997. A Rolling Horizon Framework for the InventoryRouting Problem, Technical report, Department of Management Science and Information Systems,University of Texas, Austin, TX.

Kleywegt, A. J., Nori, V. S. and Savelsbergh, M. W. P. 2000. The Stochastic Inventory Rout-ing Problem, Technical report, The Logistics Institute, School of Industrial and Systems Engineering,Georgia Institute of Technology, Atlanta, GA 30332-0205.

Kushner, H. J. and Dupuis, P. 1992. Numerical Methods for Stochastic Control Problems in ContinuousTime. Springer-Verlag, New York, NY.

Meyn, S. P. and Tweedie, R. L. 1993. Markov Chains and Stochastic Stability. Springer-Verlag, London,Great Britain.

Minkoff, A. S. 1993. A Markov Decision Model and Decomposition Heuristic for Dynamic Vehicle Dis-patching. Operations Research, 41, 77–90.

Mustard, D., Lyness, J. N. and Blatt, J. M. 1963. Numerical Quadrature in n Dimensions. TheComputer Journal , 6, 75–87.

35

Nahmias, S. and Smith, S. A. 1994. Optimizing Inventory Levels in a Two-echelon Retailer System withPartial Lost Sales. Management Science, 40, 582–596.

Nelson, B. L. and Matejcik, F. J. 1995. Using Common Random Numbers for Indifference-zone Selectionand Multiple Comparisons in Simulation. Management Science, 41, 1935–1945.

Puterman, M. L. 1994. Markov Decision Processes. John Wiley & Sons, Inc., New York, NY.

Pyke, D. F. and Cohen, M. A. 1993. Performance Characteristics of Stochastic Integrated Production-Distribution Systems. European Journal of Operational Research, 68, 23–48.

Reiman, M. I., Rubio, R. and Wein, L. M. 1999. Heavy Traffic Analysis of the Dynamic StochasticInventory-Routing Problem. Transportation Science, 33, 361–380.

Ruszczynski, A. and Syski, W. 1986. A Method of Aggregate Stochastic Subgradients with On-Line Step-size Rules for Convex Stochastic Programming Problems. Mathematical Programming Study, 28, 113–131.

Stroud, A. H. 1971. Approximate Calculation of Multiple Integrals. Prentice Hall, Englewood Cliffs, NJ.

Thomas, D. J. and Griffin, P. M. 1996. Coordinated Supply Chain Management. European Journal ofOperational Research, 94, 1–15.

Trudeau, P. and Dror, M. 1992. Stochastic Inventory Routing: Route Design with Stockouts and RouteFailures. Transportation Science, 26, 171–184.

Tsitsiklis, J. N. and Van Roy, B. 1997. An Analysis of Temporal-Difference Learning with FunctionApproximation. IEEE Transactions on Automatic Control , 42, 674–690.

Van Roy, B., Bertsekas, D. P., Lee, Y. and Tsitsiklis, J. N. 1997. A Neuro-dynamic ProgrammingApproach to Retailer Inventory Management, Technical report, Laboratory for Information and DecisionSystems, Massachusetts Institute of Technology, Cambridge, MA.

36

Appendices

A CBW Formulation

In this appendix, we present a slightly modified version of the method proposed by Chien, Balakrishnan,

and Wong (1989) (CBW), adapted for the IRPDD. At the start of each day, an integer program is solved

to determine the vehicle assignments for that day. The parameters and variables of the integer program are

given below.

n Number of customers

ri Revenue earned per unit delivered to customer i

ci Round-trip travel cost between depot and customer i

hi Holding cost per unit stored in inventory at customer i

pi Penalty per unit short at customer i

Ci Storage capacity of customer i

Xi Initial inventory at customer i

Di Estimate of the demand of customer i

m Number of vehicles

CV Vehicle capacity (for all vehicles)

dij Quantity delivered at customer i by vehicle j

yij 1 if vehicle j is assigned to customer i, 0 otherwise

δi Lower bound on the final inventory at customer i

ηi Upper bound on the shortage at customer i

The integer program is given below.

Maximize∑i,j

ridij −∑i,j

ciyij − 12

∑i

hi

Xi +

∑j

dij + δi

− α

∑i

piηi (20)

subject to∑j

dij ≤ Ci −Xi, ∀i (21)

dij − CV yij ≤ 0, ∀i, j (22)∑i

yij ≤ 1, ∀j (23)

Xi +∑j

dij −Di ≤ δi ∀i (24)

−Xi +

∑j

dij −Di

≤ ηi ∀i (25)

δi, ηi, dij ≥ 0 ∀i, jyij binary ∀i, j

37

Constraints (21) ensure that the total amount of product delivered to a customer does not exceed the

customer’s remaining capacity. Constraints (22) ensure that the amount of product delivered to a customer

by a single vehicle is no more than the vehicle capacity. Constraints (23) ensure that a vehicle is assigned

to at most one customer. Constraints (24) and (25) determine, for each customer, the final inventory or

shortage at the end of the day. The inventory at the end of the day is computed as max{0, Xi+∑j dij−Di},

where Di is taken to be the maximum demand as suggested by CBW. Likewise, shortage is computed as

max{0,−Xi −∑j dij + Di}. Note that by the choice of Di, the holding costs are underestimated and the

shortage costs are overestimated. This may result in a conservative low-risk policy.

The objective function consists of four parts: the revenue earned, the transportation cost, the inventory

holding cost and the shortage cost. As proposed by CBW, the revenue earned per unit is given by ri + pi or

ri depending on whether or not there was a shortage in the previous period. Their model has been modified

slightly by incorporating a linear inventory holding cost given by half the sum of inventory after delivery

and inventory at the end of the day, times the per unit holding cost. We have also assumed that shortages

occur at the end of the day and are discounted at a rate α to the beginning of the day. Finally, it is assumed

that the depot has an unlimited supply of the product.

38

B Instances Used in Computational Results

Table 8: Instance opt1.i Ci fi ci ri pi hi

0 1 2 3 4 5 6 7 8 9 10

1 10 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 50 80 30 52 10 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 50 80 30 5

n = 2, m = 2, CV = 5, α = 0.98


0 1 2 3 4 5 6 7 8 9 10

1 10 0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0 0.0 0.0 10 120 0 42 10 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 50 140 3 0

n = 2, m = 1, CV = 8, α = 0.98


0 1 2 3 4 5

1 5 0 0.2 0.1 0.1 0.5 0.1 50 100 30 22 5 0 0.2 0.2 0.2 0.2 0.2 20 150 60 43 5 0 0.0 0.4 0.0 0.0 0.6 20 60 40 1

n = 3, m = 2, CV = 5, α = 0.98


0 1 2 3 4 5 6 7 8

1 8 0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 120 80 30 52 8 0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 120 80 40 53 8 0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.5 120 80 20 5

n = 3, m = 3, CV = 8, α = 0.98


0 1 2 3 4 5 6 7 8 9 10

1 10 0 0.11 0.07 0.25 0.10 0.05 0.05 0.10 0.04 0.06 0.17 100 180 50 52 10 0 0.12 0.05 0.05 0.08 0.21 0.05 0.11 0.06 0.06 0.21 70 120 50 153 10 0 0.09 0.08 0.24 0.06 0.04 0.01 0.24 0.01 0.15 0.08 80 150 50 25

n = 3, m = 2, CV = 10, α = 0.98


0 1 2 3 4

1 4 0 0.00 0.50 0.00 0.50 100 80 50 52 4 0 0.25 0.50 0.00 0.25 120 80 50 103 4 0 0.00 0.50 0.25 0.25 120 80 50 54 4 0 0.50 0.00 0.00 0.50 100 80 60 4

n = 4, m = 3, CV = 4, α = 0.98

39


0 1 2 3 4 51 5 0 0.01 0.33 0.32 0.03 0.31 36 134 17 72 5 0 0.14 0.30 0.26 0.11 0.19 81 88 39 63 5 0 0.04 0.44 0.22 0.28 0.02 73 101 30 54 5 0 0.18 0.13 0.22 0.30 0.17 53 153 36 11

n = 4, m = 3, CV = 5, α = 0.98


0 1 2 3 4 5 61 6 0.00 0.01 0.29 0.28 0.02 0.27 0.13 63 92 21 102 6 0.00 0.25 0.22 0.10 0.16 0.02 0.25 56 102 32 103 6 0.00 0.18 0.24 0.02 0.19 0.14 0.23 30 101 30 104 6 0.00 0.23 0.13 0.11 0.11 0.19 0.23 73 80 30 10

n = 4, m = 4, CV = 4, α = 0.98


0 1 2 31 3 0 0.02 0.49 0.49 64 73 30 172 3 0 0.07 0.63 0.30 61 130 13 63 3 0 0.45 0.39 0.16 56 92 35 114 3 0 0.37 0.05 0.58 73 102 39 165 3 0 0.41 0.55 0.04 66 101 31 12

n = 5, m = 3, CV = 3, α = 0.98


0 1 2 3 41 4 0 0.01 0.48 0.47 0.04 46 88 30 112 4 0 0.30 0.14 0.30 0.26 91 101 36 143 4 0 0.18 0.30 0.04 0.48 83 153 35 84 4 0 0.29 0.38 0.03 0.30 63 55 33 175 4 0 0.16 0.27 0.36 0.21 106 97 30 15

n = 5, m = 5, CV = 4, α = 0.98


0 1 2 3 4 5 6 7 8 9 101 10 0.00 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 120 80 50 02 10 0.00 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 120 80 50 03 10 0.00 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 120 80 50 0

n = 3, m = 2, CV = 5, α = 0.98


0 1 2 3 4 51 5 0.00 0.20 0.20 0.20 0.20 0.20 20 80 50 152 5 0.00 0.20 0.20 0.20 0.20 0.20 20 80 50 153 5 0.00 0.20 0.20 0.20 0.20 0.20 20 80 50 15

n = 3, m = 2, CV = 5, α = 0.98

40

Table 20: Instances cst1-cst6. The values of (n,m) are (10, 5), (20, 10), (30, 15), (40, 20), (50, 25) and (60, 30).i Ci fi ci ri pi hi

0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 22 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 23 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 14 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 25 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 36 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 27 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 19 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 2

10 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 311 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 212 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 213 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 215 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 316 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 217 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 218 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 119 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 220 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 321 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 222 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 223 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 124 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 225 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 326 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 227 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 228 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 129 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 230 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 331 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 232 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 233 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 134 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 235 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 336 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 237 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 238 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 139 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 240 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 341 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 242 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 243 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 144 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 245 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 346 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 247 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 248 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 149 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 250 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 351 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 252 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 253 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 154 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 255 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 356 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 257 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 258 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 159 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 260 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 3

CV = 10, α = 0.98

41

Table 21: Instances veh1-veh6. The values of m are 6, 8, 10, 12, 14 and 16.i Ci fi ci ri pi hi

0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 22 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 23 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 24 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 25 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 26 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 27 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 29 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 2

10 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 211 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 112 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 113 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 115 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 116 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 217 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 218 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 219 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 220 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 221 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 322 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 323 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 324 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 325 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 3

n = 25, CV = 10, α = 0.98

42

Table 22: Instance var1.i Ci fi ci ri pi hi

0 1 2 3 4 5 6 7 8 9 101 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 22 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 23 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 14 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 25 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 36 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 27 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 19 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 2

10 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 311 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 212 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 213 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 215 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 316 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 217 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 218 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 119 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 220 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 321 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 222 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 223 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 124 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 225 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 3

n = 25, m = 10, CV = 10, α = 0.98


0 1 2 3 4 5 6 7 8 9 101 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 22 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 23 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 14 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 25 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 36 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 27 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 19 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 2

10 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 311 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 212 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 213 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 215 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 316 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 217 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 218 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 119 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 220 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 321 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 222 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 223 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 124 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 225 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 3

n = 25, m = 10, CV = 10, α = 0.98

43


0 1 2 3 4 5 6 7 8 9 101 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 22 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 23 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 14 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 25 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 36 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 27 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 28 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 19 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 2

10 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 311 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 212 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 213 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 114 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 215 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 316 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 217 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 218 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 119 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 220 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 321 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 222 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 223 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 124 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 225 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 3

n = 25, m = 10, CV = 10, α = 0.98


0 1 2 3 4 5 6 7 8 9 101 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 22 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 23 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 14 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 25 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 36 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 27 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 28 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 19 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 2

10 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 311 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 212 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 213 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 114 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 215 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 316 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 217 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 218 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 119 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 220 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 321 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 222 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 223 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 124 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 225 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 3

n = 25, m = 10, CV = 10, α = 0.98

44


0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 22 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 23 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 14 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 25 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 36 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 27 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 28 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 19 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 2

10 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 311 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 212 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 213 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 114 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 215 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 316 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 217 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 218 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 119 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 220 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 321 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 222 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 223 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 124 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 225 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 3

n = 25, m = 10, CV = 10, α = 0.98


0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 22 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 23 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 14 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 25 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 36 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 27 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 28 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 19 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 2

10 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 311 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 212 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 213 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 114 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 215 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 316 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 217 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 218 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 119 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 220 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 321 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 222 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 223 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 124 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 225 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 3

n = 25, m = 10, CV = 10, α = 0.98

45

Documents

The Stochastic Inventory Routing Problem with Direct Deliveries