There is a growing interest in the distributed optimization framework that goes under the name of Federated Learning (FL). In particular, much attention is being turned to FL scenarios where the network is strongly heterogeneous in terms of communication resources (e.g., bandwidth) and data distribution. In these cases, communication between local machines (agents) and the central server (Master) is a main consideration. In this work, we present SHED, an original communication-constrained Newton-type (NT) algorithm designed to accelerate FL in such scenarios. SHED is by design robust to non independent identically distributed (non i.i.d.) data distributions, handles heterogeneity of agents’ communication resources (CRs), only requires sporadic Hessian computations, and achieves global asymptotic super-linear convergence. This is possible thanks to an incremental strategy, based on eigendecomposition of the local Hessian matrices, which exploits (possibly) outdated second-order information. SHED is thoroughly validated on real datasets by assessing (i) the number of communication rounds required for convergence, (ii) the overall amount of data transmitted and (iii) the number of local Hessian computations. For all these metrics, SHED shows superior performance against state-of-the art techniques like BFGS, GIANT and FedNL.

SHED: A Newton-type algorithm for federated learning based on incremental Hessian eigenvector sharing

Rossi, Michele
Investigation
;
Schenato, Luca
Investigation
2024

Abstract

There is a growing interest in the distributed optimization framework that goes under the name of Federated Learning (FL). In particular, much attention is being turned to FL scenarios where the network is strongly heterogeneous in terms of communication resources (e.g., bandwidth) and data distribution. In these cases, communication between local machines (agents) and the central server (Master) is a main consideration. In this work, we present SHED, an original communication-constrained Newton-type (NT) algorithm designed to accelerate FL in such scenarios. SHED is by design robust to non independent identically distributed (non i.i.d.) data distributions, handles heterogeneity of agents’ communication resources (CRs), only requires sporadic Hessian computations, and achieves global asymptotic super-linear convergence. This is possible thanks to an incremental strategy, based on eigendecomposition of the local Hessian matrices, which exploits (possibly) outdated second-order information. SHED is thoroughly validated on real datasets by assessing (i) the number of communication rounds required for convergence, (ii) the overall amount of data transmitted and (iii) the number of local Hessian computations. For all these metrics, SHED shows superior performance against state-of-the art techniques like BFGS, GIANT and FedNL.
2024
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3508713
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact