1/41 Intelligent agent strategies for dynamic vehicle routing problems Martijn Mes Assistant Professor Department Operational Methods for Production and

1/41 Intelligent agent strategies for dynamic vehicle routing problems Martijn Mes Assistant Professor Department Operational Methods for Production and Logistics School of Management and Governance

2/41 Structure 2. Vehicle intelligence: Opportunity valuation 3. Shipper intelligence: Threshold policies 4. Combination: Interaction of vehicle and shipper strategies 1. Problem introduction: - DPDPTW - Multi-agent system - Decisions & difficulties 5. Future research: Approximate Dynamic Programming

3/41 IntroductionCombinationVehicle intelligenceFuture Part 1 Problem introduction Shipper intelligence

4/41 Problem setting Transportation network: Network of nodes and arcs Transportation jobs between the nodes Full truckload Dynamic arrival Time-window restrictions Vehicles to transport these loads Dynamic Pickup and Delivery Problem with time-windows, full truckloads and stochastic job arrivals Decisions: allocating and scheduling jobs IntroductionCombinationVehicle intelligenceFutureShipper intelligence

5/41 Solution approach: Multi-Agent System (MAS) a computer system that is capable of independent (autonomous) action on behalf of its user or owner (Wooldridge, 2002)

6/41 MAS as we use it All vehicles are represented by vehicle agents Vehicles decide upon their actions and maintain their own schedules An auctioneer (the shipper agent) starts an auction for each new incoming job Vehicles bid on these jobs based on their current status and schedule The auctioneer evaluates all bids and determines the winner The winning vehicle receives the new job IntroductionCombinationVehicle intelligenceFutureShipper intelligence

7/41 Auction example Amsterdam Groningen Enschede Utrecht Eindhoven Rotterdam Zwolle 10 20 Shipper likes to send package Announcement Bid Asks for auctioning Winner IntroductionCombinationVehicle intelligenceFutureShipper intelligence

8/41 Decisions involved Shipper: Assignment decision: assign order to which vehicle? Vehicle: Pricing decision: accept order for which price? Scheduling and routing decision: when to pickup and deliver the new load? this decision is supported by means of an auction IntroductionCombinationVehicle intelligenceFutureShipper intelligence

9/41 Will it work IntroductionCombinationVehicle intelligenceFutureShipper intelligence

10/41 Difficulties Allocation not optimal due to wrong estimation of real impact of order insertion regrettable allocations; at time of allocation this seems the best option, later (due to uncertainties and new order arrivals) we regret this allocation Therefore we propose some local corrections to improve the allocation in terms of individual benefits overall logistics performance We can do something at: 1)Job announcement 2)Bid calculation 3)Bid evaluation IntroductionCombinationVehicle intelligenceFutureShipper intelligence

11/41 IntroductionCombinationVehicle intelligenceFutureShipper intelligence To overcome the difficulties, we need some kind of look- ahead in: - bid pricing - bid evaluation

12/41 Vehicle: opportunity valuation take into account future job arrivals for pricing, scheduling, and waiting decisions Shipper: dynamic threshold policy take into account price fluctuations due to new order arrivals for delaying (and breaking) commitments Two options: IntroductionCombinationVehicle intelligenceFutureShipper intelligence

13/41 Possible applications Internal logistics MAS control of AGVs within an underground logistics system at Amsterdam Airport Schiphol MAS control of AGVs at an industrial bakery in the Netherlands External logistics Shippers with private fleets Collaborative carriers Multiple carriers and shippers participating in transportation procurement auctions IntroductionCombinationVehicle intelligenceFutureShipper intelligence

14/41 Vehicles: opportunity valuation take into account future job arrivals for pricing, scheduling, and waiting decisions Part 2 IntroductionCombinationVehicle intelligenceFutureShipper intelligence

15/41 Importance of opportunity valuation (1) Pricing decisions Amsterdam Groningen Enschede Utrecht Eindhoven Rotterdam Suppose the travel time and travel costs Enschede-Amsterdam and Enschede- Groningen are equal, also bid the same price? Possibly we also find an order on the route Enschede-Utrecht A longer job will cover your fixed costs for a longer period, however multiple small jobs can result in higher profits depending on the auction mechanism Possibly it is better to be in Amsterdam than Groningen two hours from now Take into account the opportunities of a schedule ! Low High Transport intensity IntroductionCombinationVehicle intelligenceFutureShipper intelligence

16/41 Importance of opportunity valuation (2) Routing and scheduling decisions Amsterdam Groningen Enschede Utrecht Eindhoven Rotterdam Suppose you have won an order on route Utrecht-Amsterdam and Rotterdam- Eindhoven and are located in Enschede Routing: in which order to visit the cities? Scheduling: when to pickup and deliver? Create a large gap between delivery in Amsterdam and pickup in Rotterdam? 8:00 10:00 11:00 19:00 20:00 Take into account the opportunities of a schedule ! Low High Transport intensity IntroductionCombinationVehicle intelligenceFutureShipper intelligence

17/41 Importance of opportunity valuation (3) Operational decisions Amsterdam Groningen Enschede Utrecht Eindhoven Rotterdam After delivering the load in Amsterdam at 11:00 you have to decide what to do Drive directly to Rotterdam (and wait there for 6 hours) 11:00 19:00 20:00 Wait in Amsterdam until you win a new order which you can do before the order Rotterdam-Eindhoven, if you not received such an order before 17:00, then move empty towards Rotterdam Move empty towards Utrecht in anticipation of an order Utrecht- Rotterdam, if you not receive such an order before 17:00, then move empty towards Rotterdam Low High Transport intensity Take into account the opportunities of a schedule ! IntroductionCombinationVehicle intelligenceFutureShipper intelligence

18/41 Opportunity valuation All these questions/decisions have in common: We weigh all possible gaps between loaded moves against each other and against a certain end state Reason: new jobs are inserted either between 2 jobs or added to the end So if we could value these periods we are done IntroductionCombinationVehicle intelligenceFutureShipper intelligence

19/41 Opportunity valuation So we derive 3 value functions: End-value V e (i,s,t) = expected revenue during a finite horizon t, after arrival at schedule destination i a time s from now Gap-value V g (i,j,s,t) = expected revenue during a period t in a gap with starting node i, end-node j, and time s until arrival at i Flexible gap-value V g (i,j,s,t) = same, but now t denotes the maximum gap-length (gap elasticity) Calculate using auction data & SDP IntroductionCombinationVehicle intelligenceFutureShipper intelligence

20/41 Value functions Gap 1Gap 2 End Job 1Job 2Job 3Job 4 Location A B C A D B D Time 0 2 6 8 10 14 16 T Waiting Empty moves New job insertions Value of a schedule: C d (J 1 )C d (J 2 )C d (J 3 )C d (J 4 ) - Direct costs C d (J l ) for all jobs I V g (B,C,4,2)V g (D,B,4,10) + Gap-value V g (i,j,,t) for all gaps with start-node i, end-node j, length and time-to-go t V e (D,16,T-16) + End-value V e (i,t) of a schedule destination i with time-to-go t Vehicle schedule: Jobs with origin, destination, pickup and delivery times IntroductionCombinationVehicle intelligenceFutureShipper intelligence

21/41 SDP illustration: End-value B C D A EndJ1J2 B A C 0 1 2 3 4 5 6 7 8 Full move Empty move Waiting \ pro-active move IntroductionCombinationVehicle intelligenceFutureShipper intelligence

22/41 Using the value functions Pricing and scheduling: Gap 1EndJob 1Job 2 Location A B C B Time 0 2 12 14 T Gap 1EndJob 1Job 2 Location A B C B Time 0 2 12 14 T Job 3Gap 2Gap 3 6 8 C D Price = C d (Job3) + V g (B,C,10,2) - V g (B,C,4,2) - V g (D,C,4,8) V g (B,C,4,2)V g (D,C,4,8) V g (B,C,10,2) C d (Job3) Job 3 Scheduling = Choose the pickup time of the new job which result in the lowest bid price IntroductionCombinationVehicle intelligenceFutureShipper intelligence

23/41 Weigh gap-values with end-values EndJob 1 Location A B Time 0 2 2+T Job 2Gap 1 X Y g ht IntroductionCombinationVehicle intelligenceFutureShipper intelligence

24/41 Weigh gap-values with end-values EndJob 1Job 2 Location A B Time 0 2 2+T Gap 1 X D g ht zero gap length gap equal to empty travel time From B to C optimal gap length slightly longer IntroductionCombinationVehicle intelligenceFutureShipper intelligence

25/41 Illustration flexible gap-values Gap-value is zero for origin equal to destination Gap-value for different start-nodes with unattractive end-node Gap-value might be positive for unattractive end-nodes, such a job will serve as a backup for arriving at an unattractive node Gap-value increase with increasing elasticity, because the probability that the empty move will be replaced by a loaded one increases Elasticity should be high enough for an empty move IntroductionCombinationVehicle intelligenceFutureShipper intelligence Job 1 Job 2 Gap elasticity Job 2

26/41 Results Opportunity valuation increases the logistic performance (in terms of profits, capacity utilization and delivery reliability) with respect to: the system wide performance = savings of 10% individual benefits = profit of one smart player higher than the total profit of his 9 competitors Explanations: gaps are effectively created to avoid empty moves unattractive jobs are scheduled later (increasing the probability of combining this job with another job) smart carriers tend to select only the most profitable jobs More information: Mes, M.R.K., M.C. van der Heijden, and P. Schuur (2008). Look-ahead strategies for dynamic pickup and delivery problems. OR Spectrum. IntroductionCombinationVehicle intelligenceFutureShipper intelligence

27/41 Part 3 Shipper: dynamic threshold policy take into account price fluctuations due to new order arrivals for delaying and breaking commitments IntroductionCombinationVehicle intelligenceFutureShipper intelligence

28/41 Dynamic threshold policy (1) Shipper has to do some bid evaluation: accept best bid or not To support this decision we use a threshold policy If the best bid is below a certain threshold price it is accepted, otherwise auction stays open (continuous auctions) a new auction will be started some time period later (repeated auctions) This threshold price is given by the expected price after rejecting the best bid Literature: Optimal auctions & Optimal stopping IntroductionCombinationVehicle intelligenceFutureShipper intelligence

29/41 Dynamic threshold policy (2) The threshold prices are given by a threshold function V t (,t,o,d) time until latest pickup time travel distance t Origin- and destination region o, d We use SDP to calculate this function Important aspects Time-dependent mean bid prices Time-dependent variances in bid prices Correlated bids Censored observations w.r.t. the penalty costs IntroductionCombinationVehicle intelligenceFutureShipper intelligence

30/41 Breaking commitments Besides delaying commitments (by the use of reserve/threshold prices) it is also possible to break commitments The decommitment policy: Vehicles are allowed to decommit from an agreement against certain penalties Vehicles decommit whenever the expected profit for a new job is higher than the profit for an old job minus the decommitment penalty These penalties are set by the shipper and reflect the extra costs a shipper expect to make when re-auctioning a job later (so there is some equivalence between both policies) IntroductionCombinationVehicle intelligenceFutureShipper intelligence

31/41 Results If only one player uses the proposed policies, his costs per job are 20-30% lower than those who did not use the policies. The two policies are complementary, however, the combination requires a lot of computation time. If we use the proposed policies for only 1% of the jobs, the total costs are being reduced with more than 1%. If more jobs are auctioned in a clever way, learning becomes more difficult. Mes, M.R.K., M.C. van der Heijden, and P.C. Schuur (2008). Dynamic threshold policy for delaying and breaking commitments in transportation auctions. Transportation Research Part C. IntroductionCombinationVehicle intelligenceFutureShipper intelligence

32/41 Part 4 Interaction of vehicle and shipper strategies IntroductionCombinationVehicle intelligenceFutureShipper intelligence

33/41 Vehicle: opportunity valuation take into account future job arrivals for pricing, scheduling, and waiting decisions Shipper: dynamic threshold policy take into account price fluctuations due to new order arrivals for delaying (and breaking) commitments Interaction of vehicle and shipper strategies: back to MAS IntroductionCombinationVehicle intelligenceFutureShipper intelligence

34/41 Approach IntroductionCombinationVehicle intelligenceFutureShipper intelligence

35/41 Problems Each player has to incorporate: opponents behavior (i.e. a carrier takes into account whether a shipper uses threshold prices) competitors behavior (i.e. a carriers takes into account whether other carriers value opportunities) Players have to learn this Learning problems: Long learning phase Increasing bid prices Fluctuations in bid prices Luckily, these problems can be fixed IntroductionCombinationVehicle intelligenceFutureShipper intelligence

36/41 Some results Relative savings of various policies compared to a myopic insertion strategy: IntroductionCombinationVehicle intelligenceFutureShipper intelligence

37/41 Conclusions (1/2) Savings of 10-20% with combination of policies Each policy has its own benefits, e.g. Opportunity valuation unbalanced networks Dynamic threshold policy long time-windows and low job arrival rate Savings are 52% of savings from MIP approach However, our savings are achieved without significant additional computation time But still, the difference in performance gives rise to further research IntroductionCombinationVehicle intelligenceFutureShipper intelligence

38/41 IntroductionCombinationVehicle intelligenceFuture Part 5 Future research Shipper intelligence

39/41 Disadvantages of our SDP approach Difficult to add all kind of model details (e.g. driver regulations and time-dependent travel times). With increasing problem sizes, the time needed to calculate the value functions increases drastically. In highly dynamic environments we might use outdated or even the wrong value functions (and we might never discover this discrepancy) IntroductionCombinationVehicle intelligenceFutureShipper intelligence

40/41 Possible solution Learn the value functions instead of calculating them Avoid difficult modeling issues (e.g. modeling opponents behavior) Avoid using wrong value functions To learn the value functions: ADP \ RL (temporal difference learning) To speed up the learning process and to reduce computation time: Value function approximation (piecewise linear functions, KNN, CMACs) Use SDP as starting point IntroductionCombinationVehicle intelligenceFutureShipper intelligence

41/41 Questions? Martijn Mes University of Twente School of Management and Governance Operational Methods for Production and Logistics Phone:+31-534894062 Email:[email protected]@utwente.nl Web:http://mb.utwente.nl/ompl/staff/Mes/

Documents

1/41 Intelligent agent strategies for dynamic vehicle routing problems Martijn Mes Assistant Professor Department Operational Methods for Production and