1/41 Intelligent agent strategies for dynamic vehicle routing problems Martijn Mes Assistant...
If you can't read please download the document
1/41 Intelligent agent strategies for dynamic vehicle routing problems Martijn Mes Assistant Professor Department Operational Methods for Production and
1/41 Intelligent agent strategies for dynamic vehicle routing
problems Martijn Mes Assistant Professor Department Operational
Methods for Production and Logistics School of Management and
Governance
Slide 2
2/41 Structure 2. Vehicle intelligence: Opportunity valuation
3. Shipper intelligence: Threshold policies 4. Combination:
Interaction of vehicle and shipper strategies 1. Problem
introduction: - DPDPTW - Multi-agent system - Decisions &
difficulties 5. Future research: Approximate Dynamic
Programming
Slide 3
3/41 IntroductionCombinationVehicle intelligenceFuture Part 1
Problem introduction Shipper intelligence
Slide 4
4/41 Problem setting Transportation network: Network of nodes
and arcs Transportation jobs between the nodes Full truckload
Dynamic arrival Time-window restrictions Vehicles to transport
these loads Dynamic Pickup and Delivery Problem with time-windows,
full truckloads and stochastic job arrivals Decisions: allocating
and scheduling jobs IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 5
5/41 Solution approach: Multi-Agent System (MAS) a computer
system that is capable of independent (autonomous) action on behalf
of its user or owner (Wooldridge, 2002)
Slide 6
6/41 MAS as we use it All vehicles are represented by vehicle
agents Vehicles decide upon their actions and maintain their own
schedules An auctioneer (the shipper agent) starts an auction for
each new incoming job Vehicles bid on these jobs based on their
current status and schedule The auctioneer evaluates all bids and
determines the winner The winning vehicle receives the new job
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 7
7/41 Auction example Amsterdam Groningen Enschede Utrecht
Eindhoven Rotterdam Zwolle 10 20 Shipper likes to send package
Announcement Bid Asks for auctioning Winner
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 8
8/41 Decisions involved Shipper: Assignment decision: assign
order to which vehicle? Vehicle: Pricing decision: accept order for
which price? Scheduling and routing decision: when to pickup and
deliver the new load? this decision is supported by means of an
auction IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 9
9/41 Will it work IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 10
10/41 Difficulties Allocation not optimal due to wrong
estimation of real impact of order insertion regrettable
allocations; at time of allocation this seems the best option,
later (due to uncertainties and new order arrivals) we regret this
allocation Therefore we propose some local corrections to improve
the allocation in terms of individual benefits overall logistics
performance We can do something at: 1)Job announcement 2)Bid
calculation 3)Bid evaluation IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 11
11/41 IntroductionCombinationVehicle intelligenceFutureShipper
intelligence To overcome the difficulties, we need some kind of
look- ahead in: - bid pricing - bid evaluation
Slide 12
12/41 Vehicle: opportunity valuation take into account future
job arrivals for pricing, scheduling, and waiting decisions
Shipper: dynamic threshold policy take into account price
fluctuations due to new order arrivals for delaying (and breaking)
commitments Two options: IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 13
13/41 Possible applications Internal logistics MAS control of
AGVs within an underground logistics system at Amsterdam Airport
Schiphol MAS control of AGVs at an industrial bakery in the
Netherlands External logistics Shippers with private fleets
Collaborative carriers Multiple carriers and shippers participating
in transportation procurement auctions
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 14
14/41 Vehicles: opportunity valuation take into account future
job arrivals for pricing, scheduling, and waiting decisions Part 2
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 15
15/41 Importance of opportunity valuation (1) Pricing decisions
Amsterdam Groningen Enschede Utrecht Eindhoven Rotterdam Suppose
the travel time and travel costs Enschede-Amsterdam and Enschede-
Groningen are equal, also bid the same price? Possibly we also find
an order on the route Enschede-Utrecht A longer job will cover your
fixed costs for a longer period, however multiple small jobs can
result in higher profits depending on the auction mechanism
Possibly it is better to be in Amsterdam than Groningen two hours
from now Take into account the opportunities of a schedule ! Low
High Transport intensity IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 16
16/41 Importance of opportunity valuation (2) Routing and
scheduling decisions Amsterdam Groningen Enschede Utrecht Eindhoven
Rotterdam Suppose you have won an order on route Utrecht-Amsterdam
and Rotterdam- Eindhoven and are located in Enschede Routing: in
which order to visit the cities? Scheduling: when to pickup and
deliver? Create a large gap between delivery in Amsterdam and
pickup in Rotterdam? 8:00 10:00 11:00 19:00 20:00 Take into account
the opportunities of a schedule ! Low High Transport intensity
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 17
17/41 Importance of opportunity valuation (3) Operational
decisions Amsterdam Groningen Enschede Utrecht Eindhoven Rotterdam
After delivering the load in Amsterdam at 11:00 you have to decide
what to do Drive directly to Rotterdam (and wait there for 6 hours)
11:00 19:00 20:00 Wait in Amsterdam until you win a new order which
you can do before the order Rotterdam-Eindhoven, if you not
received such an order before 17:00, then move empty towards
Rotterdam Move empty towards Utrecht in anticipation of an order
Utrecht- Rotterdam, if you not receive such an order before 17:00,
then move empty towards Rotterdam Low High Transport intensity Take
into account the opportunities of a schedule !
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 18
18/41 Opportunity valuation All these questions/decisions have
in common: We weigh all possible gaps between loaded moves against
each other and against a certain end state Reason: new jobs are
inserted either between 2 jobs or added to the end So if we could
value these periods we are done IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 19
19/41 Opportunity valuation So we derive 3 value functions:
End-value V e (i,s,t) = expected revenue during a finite horizon t,
after arrival at schedule destination i a time s from now Gap-value
V g (i,j,s,t) = expected revenue during a period t in a gap with
starting node i, end-node j, and time s until arrival at i Flexible
gap-value V g (i,j,s,t) = same, but now t denotes the maximum
gap-length (gap elasticity) Calculate using auction data & SDP
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 20
20/41 Value functions Gap 1Gap 2 End Job 1Job 2Job 3Job 4
Location A B C A D B D Time 0 2 6 8 10 14 16 T Waiting Empty moves
New job insertions Value of a schedule: C d (J 1 )C d (J 2 )C d (J
3 )C d (J 4 ) - Direct costs C d (J l ) for all jobs I V g
(B,C,4,2)V g (D,B,4,10) + Gap-value V g (i,j,,t) for all gaps with
start-node i, end-node j, length and time-to-go t V e (D,16,T-16) +
End-value V e (i,t) of a schedule destination i with time-to-go t
Vehicle schedule: Jobs with origin, destination, pickup and
delivery times IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 21
21/41 SDP illustration: End-value B C D A EndJ1J2 B A C 0 1 2 3
4 5 6 7 8 Full move Empty move Waiting \ pro-active move
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 22
22/41 Using the value functions Pricing and scheduling: Gap
1EndJob 1Job 2 Location A B C B Time 0 2 12 14 T Gap 1EndJob 1Job 2
Location A B C B Time 0 2 12 14 T Job 3Gap 2Gap 3 6 8 C D Price = C
d (Job3) + V g (B,C,10,2) - V g (B,C,4,2) - V g (D,C,4,8) V g
(B,C,4,2)V g (D,C,4,8) V g (B,C,10,2) C d (Job3) Job 3 Scheduling =
Choose the pickup time of the new job which result in the lowest
bid price IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 23
23/41 Weigh gap-values with end-values EndJob 1 Location A B
Time 0 2 2+T Job 2Gap 1 X Y g ht IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 24
24/41 Weigh gap-values with end-values EndJob 1Job 2 Location A
B Time 0 2 2+T Gap 1 X D g ht zero gap length gap equal to empty
travel time From B to C optimal gap length slightly longer
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 25
25/41 Illustration flexible gap-values Gap-value is zero for
origin equal to destination Gap-value for different start-nodes
with unattractive end-node Gap-value might be positive for
unattractive end-nodes, such a job will serve as a backup for
arriving at an unattractive node Gap-value increase with increasing
elasticity, because the probability that the empty move will be
replaced by a loaded one increases Elasticity should be high enough
for an empty move IntroductionCombinationVehicle
intelligenceFutureShipper intelligence Job 1 Job 2 Gap elasticity
Job 2
Slide 26
26/41 Results Opportunity valuation increases the logistic
performance (in terms of profits, capacity utilization and delivery
reliability) with respect to: the system wide performance = savings
of 10% individual benefits = profit of one smart player higher than
the total profit of his 9 competitors Explanations: gaps are
effectively created to avoid empty moves unattractive jobs are
scheduled later (increasing the probability of combining this job
with another job) smart carriers tend to select only the most
profitable jobs More information: Mes, M.R.K., M.C. van der
Heijden, and P. Schuur (2008). Look-ahead strategies for dynamic
pickup and delivery problems. OR Spectrum.
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 27
27/41 Part 3 Shipper: dynamic threshold policy take into
account price fluctuations due to new order arrivals for delaying
and breaking commitments IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 28
28/41 Dynamic threshold policy (1) Shipper has to do some bid
evaluation: accept best bid or not To support this decision we use
a threshold policy If the best bid is below a certain threshold
price it is accepted, otherwise auction stays open (continuous
auctions) a new auction will be started some time period later
(repeated auctions) This threshold price is given by the expected
price after rejecting the best bid Literature: Optimal auctions
& Optimal stopping IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 29
29/41 Dynamic threshold policy (2) The threshold prices are
given by a threshold function V t (,t,o,d) time until latest pickup
time travel distance t Origin- and destination region o, d We use
SDP to calculate this function Important aspects Time-dependent
mean bid prices Time-dependent variances in bid prices Correlated
bids Censored observations w.r.t. the penalty costs
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 30
30/41 Breaking commitments Besides delaying commitments (by the
use of reserve/threshold prices) it is also possible to break
commitments The decommitment policy: Vehicles are allowed to
decommit from an agreement against certain penalties Vehicles
decommit whenever the expected profit for a new job is higher than
the profit for an old job minus the decommitment penalty These
penalties are set by the shipper and reflect the extra costs a
shipper expect to make when re-auctioning a job later (so there is
some equivalence between both policies)
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 31
31/41 Results If only one player uses the proposed policies,
his costs per job are 20-30% lower than those who did not use the
policies. The two policies are complementary, however, the
combination requires a lot of computation time. If we use the
proposed policies for only 1% of the jobs, the total costs are
being reduced with more than 1%. If more jobs are auctioned in a
clever way, learning becomes more difficult. Mes, M.R.K., M.C. van
der Heijden, and P.C. Schuur (2008). Dynamic threshold policy for
delaying and breaking commitments in transportation auctions.
Transportation Research Part C. IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 32
32/41 Part 4 Interaction of vehicle and shipper strategies
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 33
33/41 Vehicle: opportunity valuation take into account future
job arrivals for pricing, scheduling, and waiting decisions
Shipper: dynamic threshold policy take into account price
fluctuations due to new order arrivals for delaying (and breaking)
commitments Interaction of vehicle and shipper strategies: back to
MAS IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
35/41 Problems Each player has to incorporate: opponents
behavior (i.e. a carrier takes into account whether a shipper uses
threshold prices) competitors behavior (i.e. a carriers takes into
account whether other carriers value opportunities) Players have to
learn this Learning problems: Long learning phase Increasing bid
prices Fluctuations in bid prices Luckily, these problems can be
fixed IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 36
36/41 Some results Relative savings of various policies
compared to a myopic insertion strategy:
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 37
37/41 Conclusions (1/2) Savings of 10-20% with combination of
policies Each policy has its own benefits, e.g. Opportunity
valuation unbalanced networks Dynamic threshold policy long
time-windows and low job arrival rate Savings are 52% of savings
from MIP approach However, our savings are achieved without
significant additional computation time But still, the difference
in performance gives rise to further research
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 38
38/41 IntroductionCombinationVehicle intelligenceFuture Part 5
Future research Shipper intelligence
Slide 39
39/41 Disadvantages of our SDP approach Difficult to add all
kind of model details (e.g. driver regulations and time-dependent
travel times). With increasing problem sizes, the time needed to
calculate the value functions increases drastically. In highly
dynamic environments we might use outdated or even the wrong value
functions (and we might never discover this discrepancy)
IntroductionCombinationVehicle intelligenceFutureShipper
intelligence
Slide 40
40/41 Possible solution Learn the value functions instead of
calculating them Avoid difficult modeling issues (e.g. modeling
opponents behavior) Avoid using wrong value functions To learn the
value functions: ADP \ RL (temporal difference learning) To speed
up the learning process and to reduce computation time: Value
function approximation (piecewise linear functions, KNN, CMACs) Use
SDP as starting point IntroductionCombinationVehicle
intelligenceFutureShipper intelligence
Slide 41
41/41 Questions? Martijn Mes University of Twente School of
Management and Governance Operational Methods for Production and
Logistics Phone:+31-534894062
Email:[email protected]@utwente.nl
Web:http://mb.utwente.nl/ompl/staff/Mes/