Finite-Time Analysis of Over-the-Air Federated TD Learning

Fabbro, Nicolò Dal; Mitra, Aritra; Heath, Robert W.; Schenato, Luca; Pappas, George J.

doi:10.1109/twc.2025.3555941

In recent years, federated learning has been widely studied to speed up various supervised learning tasks at the wireless network edge. However, there is a lack of theoretical understanding as to whether similar speedups in sample complexity can be achieved for cooperative reinforcement learning (RL) problems subject to communication constraints. To that end, we study a federated policy evaluation problem over wireless fading channels where, to update model parameters, a central server aggregates local temporal difference (TD) update directions from N agents via analog over-the-air computation (OAC). We refer to this scheme as OAC-FedTD and provide a rigorous finite-time convergence analysis of its performance. Our analysis reveals the impact of the noisy fading channels on the convergence rate and establishes a linear convergence speedup w.r.t. the number of agents. Notably, this is the first non-asymptotic analysis of a cooperative RL setting under wireless channels that jointly considers linear value function approximation, Markovian sampling, and the OAC channel-induced distortions and noise. Our work develops the theoretical foundations that are key for relevant advancements in the analysis and design of federated reinforcement learning algorithms over wireless networks.