We investigate the minimization of the age of information (AoI) of an AI-powered application that requires timely processing of data generated by a multitude of users. We consider that sequences of inference tasks generated at individual terminals can either be processed locally with a tiny machine learning (ML) model or be offloaded to a more powerful ML model residing on an edge computing facility shared by all users. Since the local ML model is less powerful, its inferences may have low confidence. When this happens, the user is forced to repeat the inference with the more powerful edge ML model. The choice between local processing or offloading follows a randomized-alpha policy, where the local ML model, while less powerful, offers the advantage to alleviate congestion of the edge server. The AoI model follows the frameworks presented in the literature for multiple sources sharing the same queue. Local processing instead works as a single-server dedicated queue, but we account for the imperfections of the tiny ML model by including a failure probability in the local server. Tasks that are processed locally but eventually fail to achieve a minimum confidence level are offloaded to the edge server, resulting in a longer overall processing time. We derive a queueing model of the entire system based on some bounds from the literature. Our results show the trade-offs between processing latency, inference accuracy, and system congestion, highlighting the importance of optimizing task allocation strategies.

Age of Information for Machine Learning Tasks With Mobile Edge Computing Offloading

Badia L.;
2025

Abstract

We investigate the minimization of the age of information (AoI) of an AI-powered application that requires timely processing of data generated by a multitude of users. We consider that sequences of inference tasks generated at individual terminals can either be processed locally with a tiny machine learning (ML) model or be offloaded to a more powerful ML model residing on an edge computing facility shared by all users. Since the local ML model is less powerful, its inferences may have low confidence. When this happens, the user is forced to repeat the inference with the more powerful edge ML model. The choice between local processing or offloading follows a randomized-alpha policy, where the local ML model, while less powerful, offers the advantage to alleviate congestion of the edge server. The AoI model follows the frameworks presented in the literature for multiple sources sharing the same queue. Local processing instead works as a single-server dedicated queue, but we account for the imperfections of the tiny ML model by including a failure probability in the local server. Tasks that are processed locally but eventually fail to achieve a minimum confidence level are offloaded to the edge server, resulting in a longer overall processing time. We derive a queueing model of the entire system based on some bounds from the literature. Our results show the trade-offs between processing latency, inference accuracy, and system congestion, highlighting the importance of optimizing task allocation strategies.
2025
IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC
36th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3583982
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact