Edge Delayed Deep Deterministic Policy Gradient: Efficient Continuous Control for Edge Scenarios

Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for learning complex policies directly from high-dimensional input spaces, enabling advances across a variety of domains. Modern DRL algorithms often rely on dual-network Q-learning architectures to approximate optimal policies to overcome overestimation bias. Recent research has introduced approaches leveraging multiple Q-functions to further mitigate overestimation effects and enhance policy reliability. However, there is a growing emphasis on deploying DRL in edge scenarios, where privacy concerns and stringent hardware constraints necessitate highly efficient algorithms. In such environments, the computational and memory efficiency of learning methods is of critical importance. In this context, we propose Edge Delayed Deep Deterministic Policy Gradient (EdgeD3), a novel reinforcement learning algorithm specifically designed for edge computing settings. EdgeD3 offers significant reductions in GPU time (by 25%) and computational and memory usage (by 30%), while consistently achieving or surpassing the performance of state-of-the-art algorithms across multiple benchmarks and in real-world tasks.

Edge Delayed Deep Deterministic Policy Gradient: Efficient Continuous Control for Edge Scenarios

Sinigaglia, Alberto;Turcato, Niccolò;Carli, Ruggero;Antonio Susto, Gian

2025

Abstract

Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for learning complex policies directly from high-dimensional input spaces, enabling advances across a variety of domains. Modern DRL algorithms often rely on dual-network Q-learning architectures to approximate optimal policies to overcome overestimation bias. Recent research has introduced approaches leveraging multiple Q-functions to further mitigate overestimation effects and enhance policy reliability. However, there is a growing emphasis on deploying DRL in edge scenarios, where privacy concerns and stringent hardware constraints necessitate highly efficient algorithms. In such environments, the computational and memory efficiency of learning methods is of critical importance. In this context, we propose Edge Delayed Deep Deterministic Policy Gradient (EdgeD3), a novel reinforcement learning algorithm specifically designed for edge computing settings. EdgeD3 offers significant reductions in GPU time (by 25%) and computational and memory usage (by 30%), while consistently achieving or surpassing the performance of state-of-the-art algorithms across multiple benchmarks and in real-world tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Lingua/e pubblicazione
	
				Inglese
Inglese
			
	Rivista su cui è pubblicata l'opera
	
				IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING
			
	N° Volume
	
				22
			
	Da pagina
	
				20596
			
	A pagina
	
				20610
			
	Totale pagine
	
				15
			
	Nome Editore
	
				Institute of Electrical and Electronics Engineers Inc.
			
	Codice DOI
	
				https://dx.doi.org/10.1109/tase.2025.3604290
			
	Codice WOS
	
				WOS:001569792300002
			
	Codice Scopus
	
				2-s2.0-105014610656
			
	Codice OpenAlex
	
				W4413822094
			
	Parole Chiave
	
				Continuous control; deep deterministic policy gradient; deep reinforcement learning; edge computing; Q-learning
			
	Presenza di coautori con affiliazione estera
	
				no
			
	Fulltext
	
				none
			
	Tutti gli autori
	
						Sinigaglia, Alberto; Turcato, Niccolò; Carli, Ruggero; Antonio Susto, Gian
					
	Tipologia
	
				01 CONTRIBUTO IN RIVISTA::01.01 - Articolo in rivista
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Numero autori
	
				4
			
	Tipologia sito docente
	
				262
			
	Appare nelle tipologie:
	
				01.01 - Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3562487

Citazioni

ND

1

1

0

social impact