Deep Reinforcement Learning (DRL) is a promising Machine Learning technique that enables robotic systems to efficiently learn high dimensional control policies. However, generating good policies requires carefully define appropriate reward functions, state, and action spaces. There is no unique methodology to make these choices, and parameter tuning is time-consuming. In this paper, we investigate how the choice of both the reward function and hyper-parameters affects the quality of the policy learned. To this aim, we compare four DRL algorithms when learning continuous torque control policies for manipulation tasks via a model-free approach. In detail, we simulate one manipulator robot and formulate two tasks: a random target reaching and a pick&place application, each with two different reward functions. Then, we select the algorithms, multiple hyper-parameters, and exhaustively compare their learning performance across the two tasks. Finally, we include the simulated and real-world execution of our best policies. The obtained performance demonstrates the validity of our proposal. Users can follow our approach when selecting the best-performing algorithm according to the assignment. Moreover, they can exploit our results to solve the same tasks, even with other manipulator robots. Generated policies will be easily portable to a physical setup while guaranteeing a perfect match between the simulated and real behaviors.
Robotic Arm Control and Task Training Through Deep Reinforcement Learning
Elisa Tosello
;Nicola Castaman;Stefano Ghidoni
2022
Abstract
Deep Reinforcement Learning (DRL) is a promising Machine Learning technique that enables robotic systems to efficiently learn high dimensional control policies. However, generating good policies requires carefully define appropriate reward functions, state, and action spaces. There is no unique methodology to make these choices, and parameter tuning is time-consuming. In this paper, we investigate how the choice of both the reward function and hyper-parameters affects the quality of the policy learned. To this aim, we compare four DRL algorithms when learning continuous torque control policies for manipulation tasks via a model-free approach. In detail, we simulate one manipulator robot and formulate two tasks: a random target reaching and a pick&place application, each with two different reward functions. Then, we select the algorithms, multiple hyper-parameters, and exhaustively compare their learning performance across the two tasks. Finally, we include the simulated and real-world execution of our best policies. The obtained performance demonstrates the validity of our proposal. Users can follow our approach when selecting the best-performing algorithm according to the assignment. Moreover, they can exploit our results to solve the same tasks, even with other manipulator robots. Generated policies will be easily portable to a physical setup while guaranteeing a perfect match between the simulated and real behaviors.File | Dimensione | Formato | |
---|---|---|---|
2005.02632.pdf
accesso aperto
Tipologia:
Preprint (submitted version)
Licenza:
Accesso libero
Dimensione
5.06 MB
Formato
Adobe PDF
|
5.06 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.