Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

Güitta-López, Lucía; Boal, Jaime; López-López, Álvaro J.

doi:10.1007/s10489-022-04227-3

Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

Published: 05 November 2022

Volume 53, pages 14903–14917, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

379 Accesses
2 Altmetric
Explore all metrics

Abstract

The industrial application of Deep Reinforcement Learning (DRL) is frequently slowed down due to an inability to generate the experience required to train the models. Collecting data often involves considerable time and financial outlays that can make it unaffordable. Fortunately, devices like robots can be trained with synthetic experience through virtual environments. With this approach, the problems of sample efficiency with artificial agents are mitigated, but another issue arises: the need to efficiently transfer the synthetic experience into the real world (sim-to-real). This paper analyzes the robustness of a state-of-the-art sim-to-real technique known as Progressive Neural Networks (PNNs) and studies how adding diversity to the synthetic experience can complement it. To better understand the drivers that lead to a lack of robustness, the robotic agent is still tested in a virtual environment to ensure total control on the divergence between the simulated and real models. The results show that a PNN-like agent exhibits a substantial decrease in its robustness at the beginning of the real training phase. Randomizing specific variables during simulation-based training significantly mitigates this issue. The average increase in the model’s accuracy is around 25% when diversity is introduced in the training process. This improvement can translate into a decrease in the number of real experiences required for the same final robust performance. Notwithstanding, adding real experience to agents should still be beneficial, regardless of the quality of the virtual experience fed to the agent. The source code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Article 10 February 2021

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Article 14 December 2022

Challenges of Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code Availability

Researchers or interested parties are welcome to contact the corresponding author L.G-L. for further explanation. Code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git

Notes

https://new.abb.com/products/robotics/industrial-robots/irb-120/irb-120-data
The source code is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git

References

Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. 2nd edn The MIT Press
Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. In: Proc 2nd conf Robot learning, vol 87, pp 561–591
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354. https://doi.org/10.1561/2200000071 https://doi.org/10.1561/2200000071
Article MATH Google Scholar
Rusu AA, Večerík M, Rothörl T, Heess N, Pascanu R, Hadsell R (2017) Sim-to-real robot learning from pixels with progressive nets. In: 1St conf. robot learning
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proc. IEEE/RSJ int conf intelligent robots and systems, pp 23–30
Bellman R (1957) A Markovian decision process. Journal of Mathematics and Mechanics, pp 679–684
Monahan GE (1982) Survey of partially observable Markov decision processes - Theory, models and algortihms. Manag Sci 28(1):1–16. https://doi.org/10.1287/mnsc.28.1.1
Article MATH Google Scholar
Al-Masrur Khan MD, Khan MRJ, Tooshil A, Sikder N, Parvez Mahmud MA, Kouzani AZ, Nahid AA (2020) A systematic review on reinforcement learning-based robotics within the last decade. IEEE Access 8:176598–176623. https://doi.org/10.1109/ACCESS.2020.3027152
Article Google Scholar
Mnih V, Puigdomènech Badia A, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proc 33rd int conf machine learning, vol 48, pp 1928–1937
Gu Z, Jia Z, Choset H (2018) Adversary A3C for robust reinforcement learning. In: Int conf learning representations
Grondman I, Busoniu L, Lopes GAD, Babuška R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42:1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Article Google Scholar
Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2017) Reinforcement learning through asynchronous advantage actor-critic on a GPU. In: Int conf learning representations
Lazaridis A (2020) Deep reinforcement learning: a state-of-the-art walkthrough. J Artif Intell Res 69:1421–1471
Article MathSciNet MATH Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240
Article Google Scholar
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning MIT press. https://doi.org/10.5555/3086952 https://www.deeplearningbook.org/ ISBN (paper-version): 978-0262035613)
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Article Google Scholar
Zamfirache IA, Precup RE, Roman RC, Petriu EM (2022) Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system. Inf Sci 583:99–120. https://doi.org/10.1016/j.ins.2021.10.070
Article Google Scholar
Singh B, Kumar R, Singh VP (2022) Reinforcement learning in robotic applications: a comprehensive survey. Artif Intell Rev 55:945–990. https://doi.org/10.1007/s10462-021-09997-9
Article Google Scholar
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: IEEE symposium series on computational intelligence, pp 737–744. https://doi.org/10.1109/SSCI47803.2020.9308468
Chen X, Hu J, Jin C, Li L, Wang L (2022) Understanding domain randomization for sim-to-real transfer. In: Int conf learning representations
Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2016) Policy distillation. In: Proc int conf learning representations
Wang JX, Kurth-Nelson Z, Soyer H, Leibo JZ, Tirumala D, Munos R, Blundell C, Kumaran D, Botvinick MM (2017) Learning to reinforcement learn. In: CogSci. https://www.deepmind.com/publications/learning-to-reinforcement-learn
Morimoto J, Doya K (2005) Robust reinforcement learning. Neural Comput 17(2):335–359. https://doi.org/10.1162/0899766053011528 https://doi.org/10.1162/0899766053011528
Article MathSciNet Google Scholar
Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: A survey of learning methods. ACM Computing Surveys 50(2). https://doi.org/10.1145/3054912
Zhu Y, Wang Z, Merel J, Rusu A, Erez T, Cabi S, Tunyasuvunakool S, Kramár J, Hadsell R, de Freitas N, Heess N (2018) Reinforcement and imitation learning for diverse visuomotor skills. In: Proceedings of robotics: science and systems. https://doi.org/10.15607/RSS.2018.XIV.009
Traoré R, Caselles-Dupré H, Lesort T, Sun T, Díaz-Rodríguez N, Filliat D (2019) Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. In: Proc int conf machine learning
Arndt K, Hazara M, Ghadirzadeh A, Kyrki V (2020) Meta reinforcement learning for sim-to-real domain adaptation. In: IEEE int conf robotics and automation
Higgins I, Pal A, Rusu A, Matthey L, Burgess C, Pritzel A, Botvinick M, Blundell C, Lerchner A (2017) DARLA: improving Zero-shot transfer in reinforcement learning. In: Proc 34th int conf machine learning, vol 70, pp 1480–1490
Shoeleh F, Asadpour M (2020) Skill based transfer learning with domain adaptation for continuous reinforcement learning domains. Applied Intelligence 50. https://doi.org/10.1007/s10489-019-01527-z https://doi.org/10.1007/s10489-019-01527-z
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S, Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping, 2018 IEEE International Conference on Robotics and Automation (ICRA) pp 4243–4250. https://doi.org/10.1109/ICRA.2018.8460875
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proc 27th int conf neural information processing systems, vol 27
Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Proc IEEE conf computer vision and pattern recognition, pp 2242–2251. https://doi.org/10.1109/CVPR.2017.241 https://doi.org/10.1109/CVPR.2017.241
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE Int conf computer vision pp 2242–2251. https://doi.org/10.1109/ICCV.2017.244 https://doi.org/10.1109/ICCV.2017.244
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks Computing Research Repository (coRR). https://doi.org/10.48550/arXiv.1606.04671
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd int conf learning representations
James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, Ibarz J, Levine S, Hadsell R, Bousmalis K (2019) Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks. In: Proc IEEE/CVF conf computer vision and pattern recognition, pp 12619–12629. https://doi.org/10.1109/CVPR.2019.01291
Mozifian M, Zhang A, Pineau J, Meger D (2020) Intervention design for effective sim2real transfer Computing Research Repository (coRR). https://doi.org/10.48550/arXiv.2012.02055
Chan SCY, Fishman S, Canny J, Korattikara A, Guadarrama S (2020) Measuring the reliability of reinforcement learning algorithms. In: Int conf learning representations
Jordan SM, Chandak Y, Cohen D, Zhang M, Thomas PS (2020) Evaluating the performance of reinforcement learning algorithms. In: Proc 37th int conf machine learning
Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: IEEE int conf intelligent robots and systems, pp 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
Stevens E, Antiga L, Viehmann T (2020) Deep learning with PyTorch

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Institute for Research in Technology (IIT), ICAI School of Engineering, Comillas Pontifical University, Santa Cruz de Marcenado, 26, Madrid, 28015, Madrid, Spain
Lucía Güitta-López, Jaime Boal & Álvaro J. López-López

Authors

Lucía Güitta-López
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Boal
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro J. López-López
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Lucía Güitta-López: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Software, Data curation, Writing - original draft preparation, Writing - review and editing. Jaime Boal: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Supervision, Writing - review and editing. Álvaro J. López-López: Conceptualization and design of this study, Methodology, Formal analysis and investigation, Supervision, Writing - review and editing.

Corresponding author

Correspondence to Lucía Güitta-López.

Ethics declarations

Consent for Publication

Not applicable

Conflict of Interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Data Availability

Researchers or interested parties are welcome to contact the corresponding author L.G-L. for further explanation. Data is available at: https://gitlab.com/comillas-cic/sim-to-real/pnn-dr.git

Consent to Participate

Not applicable

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Design of the Baseline Model (BM)

The problem tackled in this paper is formulated as a fully observable MDP with an A3C agent, whose architecture replicates the proposal by Rusu et al. [4] (Fig. 12). The agent’s observation (i.e., the model input) is a 64 x 64 RGB rendered image of the virtual environment. There are seven outputs, six from the Actor, the policies applied to each joint (i.e., the discrete probability functions that assign the probability of applying a given option from the action set to change the joints’ positions) and one for the Critic, which determines the value of the value-state function. Conversely to [4], the agent commands the position of the actuators rather than their speed, because this variable can be more easily controlled in commercial industrial robots. A softmax function is used to compute the actions’ likelihood. The optimizer employed is RMSProp with a learning rate of 1 × 10⁴ and a decay of 0.99.

To analyze the impact of the action set and the reward on the agent’s behavior, several combinations of these hyperparameters were designed and trained (Table 4). The evaluation of each model was carried out running 1000 episodes under ten different seeds using initially 5cm as reward distance, and then 10cm.

Table 4 MDPs considered to select the Baseline Model

Full size table

The Baseline Model selected for the experiments conducted in this paper is M1 because the results obtained are on a par with those presented in [4] in terms of the average accuracy, maximum failure distance, and learning time. It is characterized by a logarithmic action set that enables the agent to approach the target faster when it is distant and operate precisely in the area surrounding the goal. A discontinuous function establishes a negative reinforcement if the goal is not reached and a positive reinforcement otherwise.

The conclusion from these experiments is that even though action space discretization can be as granular as desired, simpler spaces lead to better results. In addition, a logarithmic action space seems to behave better than a linear approach, which could be explained by the fact that with the same number of choices, the logarithmic approach provides a better combination of coarse and fine movements.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Güitta-López, L., Boal, J. & López-López, Á.J. Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent. Appl Intell 53, 14903–14917 (2023). https://doi.org/10.1007/s10489-022-04227-3

Download citation

Accepted: 27 September 2022
Published: 05 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04227-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Challenges of Reinforcement Learning

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for Publication

Conflict of Interests

Additional information

Data Availability

Consent to Participate

Publisher’s note

Appendix A: Design of the Baseline Model (BM)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Challenges of Reinforcement Learning

Explore related subjects

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for Publication

Conflict of Interests

Additional information

Data Availability

Consent to Participate

Publisher’s note

Appendix A: Design of the Baseline Model (BM)

Appendix A: Design of the Baseline Model (BM)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation