Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for learning complex policies directly from high-dimensional input spaces, enabling advances across a variety of domains. Modern DRL algorithms often rely on dual-network Q-learning architectures to approximate optimal policies to overcome overestimation bias. Recent research has introduced approaches leveraging multiple Q-functions to further mitigate overestimation effects and enhance policy reliability. However, there is a growing emphasis on deploying DRL in edge scenarios, where privacy concerns and stringent hardware constraints necessitate highly efficient algorithms. In such environments, the computational and memory efficiency of learning methods is of critical importance. In this context, we propose Edge Delayed Deep Deterministic Policy Gradient (EdgeD3), a novel reinforcement learning algorithm specifically designed for edge computing settings. EdgeD3 offers significant reductions in GPU time (by 25%) and computational and memory usage (by 30%), while consistently achieving or surpassing the performance of state-of-the-art algorithms across multiple benchmarks and in real-world tasks.

Edge Delayed Deep Deterministic Policy Gradient: Efficient Continuous Control for Edge Scenarios

Sinigaglia, Alberto;Turcato, Niccolò;Carli, Ruggero;Antonio Susto, Gian
2025

Abstract

Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for learning complex policies directly from high-dimensional input spaces, enabling advances across a variety of domains. Modern DRL algorithms often rely on dual-network Q-learning architectures to approximate optimal policies to overcome overestimation bias. Recent research has introduced approaches leveraging multiple Q-functions to further mitigate overestimation effects and enhance policy reliability. However, there is a growing emphasis on deploying DRL in edge scenarios, where privacy concerns and stringent hardware constraints necessitate highly efficient algorithms. In such environments, the computational and memory efficiency of learning methods is of critical importance. In this context, we propose Edge Delayed Deep Deterministic Policy Gradient (EdgeD3), a novel reinforcement learning algorithm specifically designed for edge computing settings. EdgeD3 offers significant reductions in GPU time (by 25%) and computational and memory usage (by 30%), while consistently achieving or surpassing the performance of state-of-the-art algorithms across multiple benchmarks and in real-world tasks.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3562487
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact