Among optimal control policies for aerial robots, Reinforcement Learning promises to improve robustness with low computational effort during deployment . Nevertheless, gathering data to learn the system’s behavior represents a major issue. Ideally, training occurs on the real system, but crashes during early training stages make this impractical. Simulators are preferable but imperfect, creating a sim2real gap. More detailed simulators add computational burden, making RL infeasible. We bridge this gap by encapsulating the behavior of a detailed simulator into a Gaussian Process model. We use this computationally light model for training an RL-based control policy, which we test on a real quadrotor. Additionally, the simulator and the corresponding learned model is compared with real flight data showing their accuracy. Our results show the feasibility of conducting policy search in the simulation pipeline and motivate future use of the model-based RL algorithm directly on the real system.
Model-Based Reinforcement Learning for Robust End-to-End UAV Control from Simulation to Real System Application
Dalla Libera, Alberto;Carli, Ruggero;
2025
Abstract
Among optimal control policies for aerial robots, Reinforcement Learning promises to improve robustness with low computational effort during deployment . Nevertheless, gathering data to learn the system’s behavior represents a major issue. Ideally, training occurs on the real system, but crashes during early training stages make this impractical. Simulators are preferable but imperfect, creating a sim2real gap. More detailed simulators add computational burden, making RL infeasible. We bridge this gap by encapsulating the behavior of a detailed simulator into a Gaussian Process model. We use this computationally light model for training an RL-based control policy, which we test on a real quadrotor. Additionally, the simulator and the corresponding learned model is compared with real flight data showing their accuracy. Our results show the feasibility of conducting policy search in the simulation pipeline and motivate future use of the model-based RL algorithm directly on the real system.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




