Edge Delayed Deep Deterministic Policy Gradient: Efficient Continuous Control for Edge Scenarios

Sinigaglia, Alberto; Turcato, Niccolò; Carli, Ruggero; Antonio Susto, Gian

doi:10.1109/tase.2025.3604290

Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for learning complex policies directly from high-dimensional input spaces, enabling advances across a variety of domains. Modern DRL algorithms often rely on dual-network Q-learning architectures to approximate optimal policies to overcome overestimation bias. Recent research has introduced approaches leveraging multiple Q-functions to further mitigate overestimation effects and enhance policy reliability. However, there is a growing emphasis on deploying DRL in edge scenarios, where privacy concerns and stringent hardware constraints necessitate highly efficient algorithms. In such environments, the computational and memory efficiency of learning methods is of critical importance. In this context, we propose Edge Delayed Deep Deterministic Policy Gradient (EdgeD3), a novel reinforcement learning algorithm specifically designed for edge computing settings. EdgeD3 offers significant reductions in GPU time (by 25%) and computational and memory usage (by 30%), while consistently achieving or surpassing the performance of state-of-the-art algorithms across multiple benchmarks and in real-world tasks.