Reinforcement Learning of Optimization Strategy and Horizon Length Selection in Online Vehicle Routing

Teemu Ikonen, Keijo Heljanko, Iiro Harjunkoski

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaPosterScientificvertaisarvioitu


Mathematical optimization methods are, at present, capable of finding the optimal solution to a vast variety of complex problems emerging in process systems engineering (PSE). In online optimization, re-optimization procedures are typically performed periodically or by using event-triggered rescheduling. However, the optimal horizon length and the optimization strategy (mathematical programming or a heuristic algorithm) depend on the prevailing state of the process. This work investigates an approach where a reinforcement learning (RL) agent [1] is trained to make these decisions via adaption to the environment.

During the recent years, the attention to RL has increased in process systems engineering and operations research. Shin et al. [2] provide a review of the progress of RL and its possibilities in process control. Hubbs et al. [3] developed an open-source library OR-Gym, including is a set of classical operations research problems formulated as a RL environments. Prouvost et al. [4] propose an open-source library ECOLE, which facilitates machine learning in similar repeatedly solved combinatorial optimization problems. Hubbs et al. [5] use deep RL to make explicit scheduling decisions on a chemical production plant. In other fields, similar explicit scheduling approaches have been proposed by Semrov et al. [6] and Atallah et al. [7] Our RL-based approach operates at a higher level. It utilizes the capability of mathematical programming and heuristic algorithms to find the optimal, or a nearly optimal, solution, and focuses on deciding the scheduling horizon length and the optimization strategy in a changing operating environment. In the literature, the effect of horizon length has been studied in process scheduling as a fixed parameter [8, 9]. However, to our knowledge, no adaptive methods have been reported for this purpose.

We evaluate the approach on the urban bike share rebalancing problem, which is a generalization of the one-commodity pickup and delivery vehicle routing problem [10]. The objective is to minimize travel time of a number of rebalancing vehicles that transport bikes between fixed stations while fulfilling the service level requirement of the system. We use the mixed-integer programming model proposed by Schuijbroek et al. [11] for the rebalancing vehicle routing. The problem is chosen as a testbed because the channels of new information (updated bike counts and deviations of a vehicle locations from those scheduled) are well-defined, and the data of such systems is publicly available. Finally, we discuss the transferability of the research to more complex processes in PSE, such as process scheduling and supply chain optimization.


[1] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018.

[2] J. Shin, T. A. Badgwell, K.-H. Liu, J. H. Lee, Reinforcement learning–overview of recent progress and implications for process control, Computers & Chemical Engineering 127 (2019) 282–294.

[3] C. D. Hubbs, H. D. Perez, O. Sarwar, N. V. Sahinidis, I. E. Grossmann, J. M. Wassick, Or-gym: A reinforcement learning library for operations research problem, arXiv preprint arXiv:2008.06319.

[4] A. Prouvost, J. Dumouchelle, L. Scavuzzo, M. Gasse, D. Chételat, A. Lodi, ECOLE: A gym-like library for machine learning in combinatorial optimization solvers, arXiv preprint arXiv:2011.06069.

[5] C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, J. M. Wassick, A deep reinforcement learning approach for chemical production scheduling, Computers & Chemical Engineering 141 (2020) 106982.

[6] D. Šemrov, R. Marsetič, M.Žura, L. Todorovski, A. Srdic, Reinforcement learning approach for train rescheduling on a single-track railway, Transportation Research Part B: Methodological 86 (2016) 250–267.

[7] R. F. Atallah, C. M. Assi, M. J. Khabbaz, Scheduling the operation of a connected vehicular network using deep reinforcement learning, IEEE Transactions on Intelligent Transportation Systems 20 (5) (2018) 1669–1682.

[8] D. Gupta, C. T. Maravelias, J. M. Wassick, From rescheduling to online scheduling, Chemical Engineering Research and Design 116 (2016) 83–97.

[9] D. Gupta, C. T. Maravelias, On the design of online production scheduling algorithms, Computers & Chemical Engineering 129 (2019) 106517.

[10] H. Hernández-Pérez, J.-J. Salazar-González, A branch-and-cut algorithm for a traveling salesman problem with pickup and delivery, Discrete Applied Mathematics 145 (1) (2004) 126–139.

[11] J. Schuijbroek, R. C. Hampshire, W.-J. Van Hoeve, Inventory rebalancing and vehicle routing in bike sharing systems, European Journal of Operational Research 257 (3) (2017) 992–1004.
TilaJulkaistu - 9 marrask. 2021
OKM-julkaisutyyppiEi oikeutettu
TapahtumaAIChE Annual Meeting - Hynes Convention Center, Boston, Yhdysvallat
Kesto: 7 marrask. 202111 marrask. 2021


ConferenceAIChE Annual Meeting


Sukella tutkimusaiheisiin 'Reinforcement Learning of Optimization Strategy and Horizon Length Selection in Online Vehicle Routing'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä