Can Deep Reinforcement Learning Improve Inventory Management? Performance on Dual Sourcing, Lost Sales and Multi-Echelon Problems



We  study  the  effectiveness  of  Deep  Reinforcement  Learning  (DRL)  algorithms  on classic inventory problems in operations. Despite the excitement about Deep Reinforcement Learning in many industries, such as gaming, robotics and self-driving cars, DRL applications in operations and supply chain management remain rather scarce. We identify this gap and provide a rigorous performance evaluation of DRL on three classic and intractable inventory problems: dual-sourcing, lost sales and multi-echelon inventory management. We model each inventory problem as a Markov Decision Process and apply a state-of-the-art DRL algorithm: the A3C algorithm. We show how to apply and tune the A3C algorithm to achieve good performance across three sets of inventory problems, for a variety of parameter settings. We demonstrate that the A3C algorithm can match performance of many state-of-the-art heuristics and other approximate dynamic programming methods, with limited changes to the tuning parameters across  all  studied  problems.  Yet,  tuning  DRL  algorithms  remains  computationally  burdensome  while  the resulting  policies  often  lack  interpretability.  Generating  structural  policy  insight  or  designing  specialized policies that are (ideally provably) near optimal thus remains indispensable. DRL provides a promising research avenue, especially when problem-dependent heuristics  are  lacking.  In  this  case  DRL  may  be  used  to  set  new  benchmarks  or  provide  insight  in  the development of new heuristics.