Abstract
The industrial application of Deep Reinforcement Learning (DRL) is frequently slowed down due to an inability to generate the experience required to train the models. Collecting data often involves considerable time and financial outlays that can make it unaffordable. Fortunately, devices like robots can be trained with synthetic experience through virtual environments. With this approach, the problems of sample efficiency with artificial agents are mitigated, but another issue arises: the need to efficiently transfer the synthetic experience into the real world (sim-to-real). This paper analyzes the robustness of a state-of-the-art sim-to-real technique known as Progressive Neural Networks (PNNs) and studies how adding diversity to the synthetic experience can complement it. To better understand the drivers that lead to a lack of robustness, the robotic agent is still tested in a virtual environment to ensure total control on the divergence between the simulated and real models. The results show that a PNN-like agent exhibits a substantial decrease in its robustness at the beginning of the real training phase. Randomizing specific variables during simulation-based training significantly mitigates this issue. The average increase in the model’s accuracy is around 25% when diversity is introduced in the training process. This improvement can translate into a decrease in the number of real experiences required for the same final robust performance. Notwithstanding, adding real experience to agents should still be beneficial, regardless of the quality of the virtual experience fed to the agent. The source code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code Availability
Researchers or interested parties are welcome to contact the corresponding author L.G-L. for further explanation. Code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git
Notes
The source code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git
References
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. 2nd edn The MIT Press
Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. In: Proc 2nd conf Robot learning, vol 87, pp 561–591
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354. https://doi.org/10.1561/2200000071https://doi.org/10.1561/2200000071
Rusu AA, Večerík M, Rothörl T, Heess N, Pascanu R, Hadsell R (2017) Sim-to-real robot learning from pixels with progressive nets. In: 1St conf. robot learning
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proc. IEEE/RSJ int conf intelligent robots and systems, pp 23–30
Bellman R (1957) A Markovian decision process. Journal of Mathematics and Mechanics, pp 679–684
Monahan GE (1982) Survey of partially observable Markov decision processes - Theory, models and algortihms. Manag Sci 28(1):1–16. https://doi.org/10.1287/mnsc.28.1.1
Al-Masrur Khan MD, Khan MRJ, Tooshil A, Sikder N, Parvez Mahmud MA, Kouzani AZ, Nahid AA (2020) A systematic review on reinforcement learning-based robotics within the last decade. IEEE Access 8:176598–176623. https://doi.org/10.1109/ACCESS.2020.3027152
Mnih V, Puigdomènech Badia A, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proc 33rd int conf machine learning, vol 48, pp 1928–1937
Gu Z, Jia Z, Choset H (2018) Adversary A3C for robust reinforcement learning. In: Int conf learning representations
Grondman I, Busoniu L, Lopes GAD, Babuška R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42:1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2017) Reinforcement learning through asynchronous advantage actor-critic on a GPU. In: Int conf learning representations
Lazaridis A (2020) Deep reinforcement learning: a state-of-the-art walkthrough. J Artif Intell Res 69:1421–1471
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Goodfellow I, Bengio Y, Courville A (2016) Deep learning MIT press. https://doi.org/10.5555/3086952https://www.deeplearningbook.org/ ISBN (paper-version): 978-0262035613)
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Zamfirache IA, Precup RE, Roman RC, Petriu EM (2022) Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system. Inf Sci 583:99–120. https://doi.org/10.1016/j.ins.2021.10.070
Singh B, Kumar R, Singh VP (2022) Reinforcement learning in robotic applications: a comprehensive survey. Artif Intell Rev 55:945–990. https://doi.org/10.1007/s10462-021-09997-9
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: IEEE symposium series on computational intelligence, pp 737–744. https://doi.org/10.1109/SSCI47803.2020.9308468
Chen X, Hu J, Jin C, Li L, Wang L (2022) Understanding domain randomization for sim-to-real transfer. In: Int conf learning representations
Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2016) Policy distillation. In: Proc int conf learning representations
Wang JX, Kurth-Nelson Z, Soyer H, Leibo JZ, Tirumala D, Munos R, Blundell C, Kumaran D, Botvinick MM (2017) Learning to reinforcement learn. In: CogSci. https://www.deepmind.com/publications/learning-to-reinforcement-learn
Morimoto J, Doya K (2005) Robust reinforcement learning. Neural Comput 17(2):335–359. https://doi.org/10.1162/0899766053011528https://doi.org/10.1162/0899766053011528
Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: A survey of learning methods. ACM Computing Surveys 50(2). https://doi.org/10.1145/3054912
Zhu Y, Wang Z, Merel J, Rusu A, Erez T, Cabi S, Tunyasuvunakool S, Kramár J, Hadsell R, de Freitas N, Heess N (2018) Reinforcement and imitation learning for diverse visuomotor skills. In: Proceedings of robotics: science and systems. https://doi.org/10.15607/RSS.2018.XIV.009
Traoré R, Caselles-Dupré H, Lesort T, Sun T, Díaz-Rodríguez N, Filliat D (2019) Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. In: Proc int conf machine learning
Arndt K, Hazara M, Ghadirzadeh A, Kyrki V (2020) Meta reinforcement learning for sim-to-real domain adaptation. In: IEEE int conf robotics and automation
Higgins I, Pal A, Rusu A, Matthey L, Burgess C, Pritzel A, Botvinick M, Blundell C, Lerchner A (2017) DARLA: improving Zero-shot transfer in reinforcement learning. In: Proc 34th int conf machine learning, vol 70, pp 1480–1490
Shoeleh F, Asadpour M (2020) Skill based transfer learning with domain adaptation for continuous reinforcement learning domains. Applied Intelligence 50. https://doi.org/10.1007/s10489-019-01527-zhttps://doi.org/10.1007/s10489-019-01527-z
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S, Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping, 2018 IEEE International Conference on Robotics and Automation (ICRA) pp 4243–4250. https://doi.org/10.1109/ICRA.2018.8460875
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proc 27th int conf neural information processing systems, vol 27
Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Proc IEEE conf computer vision and pattern recognition, pp 2242–2251. https://doi.org/10.1109/CVPR.2017.241https://doi.org/10.1109/CVPR.2017.241
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE Int conf computer vision pp 2242–2251. https://doi.org/10.1109/ICCV.2017.244https://doi.org/10.1109/ICCV.2017.244
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks Computing Research Repository (coRR). https://doi.org/10.48550/arXiv.1606.04671
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd int conf learning representations
James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, Ibarz J, Levine S, Hadsell R, Bousmalis K (2019) Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks. In: Proc IEEE/CVF conf computer vision and pattern recognition, pp 12619–12629. https://doi.org/10.1109/CVPR.2019.01291
Mozifian M, Zhang A, Pineau J, Meger D (2020) Intervention design for effective sim2real transfer Computing Research Repository (coRR). https://doi.org/10.48550/arXiv.2012.02055
Chan SCY, Fishman S, Canny J, Korattikara A, Guadarrama S (2020) Measuring the reliability of reinforcement learning algorithms. In: Int conf learning representations
Jordan SM, Chandak Y, Cohen D, Zhang M, Thomas PS (2020) Evaluating the performance of reinforcement learning algorithms. In: Proc 37th int conf machine learning
Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: IEEE int conf intelligent robots and systems, pp 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
Stevens E, Antiga L, Viehmann T (2020) Deep learning with PyTorch
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
Lucía Güitta-López: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Software, Data curation, Writing - original draft preparation, Writing - review and editing. Jaime Boal: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Supervision, Writing - review and editing. Álvaro J. López-López: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Supervision, Writing - review and editing.
Corresponding author
Ethics declarations
Consent for Publication
Not applicable
Conflict of Interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Data Availability
Researchers or interested parties are welcome to contact the corresponding author L.G-L. for further explanation. Data is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git
Consent to Participate
Not applicable
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Design of the Baseline Model (BM)
Appendix A: Design of the Baseline Model (BM)
The problem tackled in this paper is formulated as a fully observable MDP with an A3C agent, whose architecture replicates the proposal by Rusu et al. [4] (Fig. 12). The agent’s observation (i.e., the model input) is a 64 x 64 RGB rendered image of the virtual environment. There are seven outputs, six from the Actor, the policies applied to each joint (i.e., the discrete probability functions that assign the probability of applying a given option from the action set to change the joints’ positions) and one for the Critic, which determines the value of the value-state function. Conversely to [4], the agent commands the position of the actuators rather than their speed, because this variable can be more easily controlled in commercial industrial robots. A softmax function is used to compute the actions’ likelihood. The optimizer employed is RMSProp with a learning rate of 1 × 104 and a decay of 0.99.
A3C architecture implemented. The input is a 64 x 64 RGB environment image that goes through two convolutional layers, the first defined by a 3 x 3 kernel and stride = 4, and the second with a 5 x 5 kernel and stride = 2. The model block is a fully connected layer with 1152 inputs and 128 outputs. Finally, a Long-Short Term Memory (LSTM) network with 128 hidden states is applied to better capture the sequence of movements
To analyze the impact of the action set and the reward on the agent’s behavior, several combinations of these hyperparameters were designed and trained (Table 4). The evaluation of each model was carried out running 1000 episodes under ten different seeds using initially 5cm as reward distance, and then 10cm.
The Baseline Model selected for the experiments conducted in this paper is M1 because the results obtained are on a par with those presented in [4] in terms of the average accuracy, maximum failure distance, and learning time. It is characterized by a logarithmic action set that enables the agent to approach the target faster when it is distant and operate precisely in the area surrounding the goal. A discontinuous function establishes a negative reinforcement if the goal is not reached and a positive reinforcement otherwise.
The conclusion from these experiments is that even though action space discretization can be as granular as desired, simpler spaces lead to better results. In addition, a logarithmic action space seems to behave better than a linear approach, which could be explained by the fact that with the same number of choices, the logarithmic approach provides a better combination of coarse and fine movements.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Güitta-López, L., Boal, J. & López-López, Á.J. Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent. Appl Intell 53, 14903–14917 (2023). https://doi.org/10.1007/s10489-022-04227-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04227-3