Skip to main content

Advertisement

Log in

Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The industrial application of Deep Reinforcement Learning (DRL) is frequently slowed down due to an inability to generate the experience required to train the models. Collecting data often involves considerable time and financial outlays that can make it unaffordable. Fortunately, devices like robots can be trained with synthetic experience through virtual environments. With this approach, the problems of sample efficiency with artificial agents are mitigated, but another issue arises: the need to efficiently transfer the synthetic experience into the real world (sim-to-real). This paper analyzes the robustness of a state-of-the-art sim-to-real technique known as Progressive Neural Networks (PNNs) and studies how adding diversity to the synthetic experience can complement it. To better understand the drivers that lead to a lack of robustness, the robotic agent is still tested in a virtual environment to ensure total control on the divergence between the simulated and real models. The results show that a PNN-like agent exhibits a substantial decrease in its robustness at the beginning of the real training phase. Randomizing specific variables during simulation-based training significantly mitigates this issue. The average increase in the model’s accuracy is around 25% when diversity is introduced in the training process. This improvement can translate into a decrease in the number of real experiences required for the same final robust performance. Notwithstanding, adding real experience to agents should still be beneficial, regardless of the quality of the virtual experience fed to the agent. The source code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Code Availability

Researchers or interested parties are welcome to contact the corresponding author L.G-L. for further explanation. Code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git

Notes

  1. https://new.abb.com/products/robotics/industrial-robots/irb-120/irb-120-data

  2. The source code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git

References

  1. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. 2nd edn The MIT Press

  2. Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. In: Proc 2nd conf Robot learning, vol 87, pp 561–591

  3. François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354. https://doi.org/10.1561/2200000071https://doi.org/10.1561/2200000071

    Article  MATH  Google Scholar 

  4. Rusu AA, Večerík M, Rothörl T, Heess N, Pascanu R, Hadsell R (2017) Sim-to-real robot learning from pixels with progressive nets. In: 1St conf. robot learning

  5. Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proc. IEEE/RSJ int conf intelligent robots and systems, pp 23–30

  6. Bellman R (1957) A Markovian decision process. Journal of Mathematics and Mechanics, pp 679–684

  7. Monahan GE (1982) Survey of partially observable Markov decision processes - Theory, models and algortihms. Manag Sci 28(1):1–16. https://doi.org/10.1287/mnsc.28.1.1

    Article  MATH  Google Scholar 

  8. Al-Masrur Khan MD, Khan MRJ, Tooshil A, Sikder N, Parvez Mahmud MA, Kouzani AZ, Nahid AA (2020) A systematic review on reinforcement learning-based robotics within the last decade. IEEE Access 8:176598–176623. https://doi.org/10.1109/ACCESS.2020.3027152

    Article  Google Scholar 

  9. Mnih V, Puigdomènech Badia A, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proc 33rd int conf machine learning, vol 48, pp 1928–1937

  10. Gu Z, Jia Z, Choset H (2018) Adversary A3C for robust reinforcement learning. In: Int conf learning representations

  11. Grondman I, Busoniu L, Lopes GAD, Babuška R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42:1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595

    Article  Google Scholar 

  12. Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2017) Reinforcement learning through asynchronous advantage actor-critic on a GPU. In: Int conf learning representations

  13. Lazaridis A (2020) Deep reinforcement learning: a state-of-the-art walkthrough. J Artif Intell Res 69:1421–1471

    Article  MathSciNet  MATH  Google Scholar 

  14. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240

    Article  Google Scholar 

  15. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  16. Goodfellow I, Bengio Y, Courville A (2016) Deep learning MIT press. https://doi.org/10.5555/3086952https://www.deeplearningbook.org/ ISBN (paper-version): 978-0262035613)

  17. Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721

    Article  Google Scholar 

  18. Zamfirache IA, Precup RE, Roman RC, Petriu EM (2022) Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system. Inf Sci 583:99–120. https://doi.org/10.1016/j.ins.2021.10.070

    Article  Google Scholar 

  19. Singh B, Kumar R, Singh VP (2022) Reinforcement learning in robotic applications: a comprehensive survey. Artif Intell Rev 55:945–990. https://doi.org/10.1007/s10462-021-09997-9

    Article  Google Scholar 

  20. Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: IEEE symposium series on computational intelligence, pp 737–744. https://doi.org/10.1109/SSCI47803.2020.9308468

  21. Chen X, Hu J, Jin C, Li L, Wang L (2022) Understanding domain randomization for sim-to-real transfer. In: Int conf learning representations

  22. Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2016) Policy distillation. In: Proc int conf learning representations

  23. Wang JX, Kurth-Nelson Z, Soyer H, Leibo JZ, Tirumala D, Munos R, Blundell C, Kumaran D, Botvinick MM (2017) Learning to reinforcement learn. In: CogSci. https://www.deepmind.com/publications/learning-to-reinforcement-learn

  24. Morimoto J, Doya K (2005) Robust reinforcement learning. Neural Comput 17(2):335–359. https://doi.org/10.1162/0899766053011528https://doi.org/10.1162/0899766053011528

    Article  MathSciNet  Google Scholar 

  25. Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: A survey of learning methods. ACM Computing Surveys 50(2). https://doi.org/10.1145/3054912

  26. Zhu Y, Wang Z, Merel J, Rusu A, Erez T, Cabi S, Tunyasuvunakool S, Kramár J, Hadsell R, de Freitas N, Heess N (2018) Reinforcement and imitation learning for diverse visuomotor skills. In: Proceedings of robotics: science and systems. https://doi.org/10.15607/RSS.2018.XIV.009

  27. Traoré R, Caselles-Dupré H, Lesort T, Sun T, Díaz-Rodríguez N, Filliat D (2019) Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. In: Proc int conf machine learning

  28. Arndt K, Hazara M, Ghadirzadeh A, Kyrki V (2020) Meta reinforcement learning for sim-to-real domain adaptation. In: IEEE int conf robotics and automation

  29. Higgins I, Pal A, Rusu A, Matthey L, Burgess C, Pritzel A, Botvinick M, Blundell C, Lerchner A (2017) DARLA: improving Zero-shot transfer in reinforcement learning. In: Proc 34th int conf machine learning, vol 70, pp 1480–1490

  30. Shoeleh F, Asadpour M (2020) Skill based transfer learning with domain adaptation for continuous reinforcement learning domains. Applied Intelligence 50. https://doi.org/10.1007/s10489-019-01527-zhttps://doi.org/10.1007/s10489-019-01527-z

  31. Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S, Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping, 2018 IEEE International Conference on Robotics and Automation (ICRA) pp 4243–4250. https://doi.org/10.1109/ICRA.2018.8460875

  32. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proc 27th int conf neural information processing systems, vol 27

  33. Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Proc IEEE conf computer vision and pattern recognition, pp 2242–2251. https://doi.org/10.1109/CVPR.2017.241https://doi.org/10.1109/CVPR.2017.241

  34. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE Int conf computer vision pp 2242–2251. https://doi.org/10.1109/ICCV.2017.244https://doi.org/10.1109/ICCV.2017.244

  35. Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks Computing Research Repository (coRR). https://doi.org/10.48550/arXiv.1606.04671

  36. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd int conf learning representations

  37. James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, Ibarz J, Levine S, Hadsell R, Bousmalis K (2019) Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks. In: Proc IEEE/CVF conf computer vision and pattern recognition, pp 12619–12629. https://doi.org/10.1109/CVPR.2019.01291

  38. Mozifian M, Zhang A, Pineau J, Meger D (2020) Intervention design for effective sim2real transfer Computing Research Repository (coRR). https://doi.org/10.48550/arXiv.2012.02055

  39. Chan SCY, Fishman S, Canny J, Korattikara A, Guadarrama S (2020) Measuring the reliability of reinforcement learning algorithms. In: Int conf learning representations

  40. Jordan SM, Chandak Y, Cohen D, Zhang M, Thomas PS (2020) Evaluating the performance of reinforcement learning algorithms. In: Proc 37th int conf machine learning

  41. Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: IEEE int conf intelligent robots and systems, pp 5026–5033. https://doi.org/10.1109/IROS.2012.6386109

  42. Stevens E, Antiga L, Viehmann T (2020) Deep learning with PyTorch

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

Lucía Güitta-López: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Software, Data curation, Writing - original draft preparation, Writing - review and editing. Jaime Boal: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Supervision, Writing - review and editing. Álvaro J. López-López: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Supervision, Writing - review and editing.

Corresponding author

Correspondence to Lucía Güitta-López.

Ethics declarations

Consent for Publication

Not applicable

Conflict of Interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Data Availability

Researchers or interested parties are welcome to contact the corresponding author L.G-L. for further explanation. Data is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git

Consent to Participate

Not applicable

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Design of the Baseline Model (BM)

Appendix A: Design of the Baseline Model (BM)

The problem tackled in this paper is formulated as a fully observable MDP with an A3C agent, whose architecture replicates the proposal by Rusu et al. [4] (Fig. 12). The agent’s observation (i.e., the model input) is a 64 x 64 RGB rendered image of the virtual environment. There are seven outputs, six from the Actor, the policies applied to each joint (i.e., the discrete probability functions that assign the probability of applying a given option from the action set to change the joints’ positions) and one for the Critic, which determines the value of the value-state function. Conversely to [4], the agent commands the position of the actuators rather than their speed, because this variable can be more easily controlled in commercial industrial robots. A softmax function is used to compute the actions’ likelihood. The optimizer employed is RMSProp with a learning rate of 1 × 104 and a decay of 0.99.

Fig. 12
figure 12

A3C architecture implemented. The input is a 64 x 64 RGB environment image that goes through two convolutional layers, the first defined by a 3 x 3 kernel and stride = 4, and the second with a 5 x 5 kernel and stride = 2. The model block is a fully connected layer with 1152 inputs and 128 outputs. Finally, a Long-Short Term Memory (LSTM) network with 128 hidden states is applied to better capture the sequence of movements

To analyze the impact of the action set and the reward on the agent’s behavior, several combinations of these hyperparameters were designed and trained (Table 4). The evaluation of each model was carried out running 1000 episodes under ten different seeds using initially 5cm as reward distance, and then 10cm.

Table 4 MDPs considered to select the Baseline Model

The Baseline Model selected for the experiments conducted in this paper is M1 because the results obtained are on a par with those presented in [4] in terms of the average accuracy, maximum failure distance, and learning time. It is characterized by a logarithmic action set that enables the agent to approach the target faster when it is distant and operate precisely in the area surrounding the goal. A discontinuous function establishes a negative reinforcement if the goal is not reached and a positive reinforcement otherwise.

The conclusion from these experiments is that even though action space discretization can be as granular as desired, simpler spaces lead to better results. In addition, a logarithmic action space seems to behave better than a linear approach, which could be explained by the fact that with the same number of choices, the logarithmic approach provides a better combination of coarse and fine movements.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Güitta-López, L., Boal, J. & López-López, Á.J. Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent. Appl Intell 53, 14903–14917 (2023). https://doi.org/10.1007/s10489-022-04227-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04227-3

Keywords

Navigation