Modern data analysis techniques are transforming the healthcare landscape from both research and operational perspectives. Applying statistical analysis, machine learning, data science, and process mining to large databases enables data analysts and clinicians to enhance the quality and precision of their research and care efforts. However, the ingestion and maintenance of high-quality datasets present significant challenges, primarily due to the resource-intensive nature of the task and the difficulty of collecting comprehensive and statistically significant information sets. Multicentric cross-organizational studies, which involve the analysis of datasets collected from various independent data nodes, address this issue; however, concerns regarding privacy and data ownership often hinder the free exchange of data. Among the existing technologies, Distributed Analytics and Federated Learning offer promising solutions to these challenges by facilitating the analysis of large, decentralized datasets while safeguarding patient privacy. In this paper, we present and release as open-source the code of the Proxy Module within the GEN-RWD Sandbox platform, an infrastructure designed for privacy-preserving distributed analytics in healthcare. The module implements essential infrastructural management functions to ensure privacy in a distributed learning environment. A detailed explanation of the module functioning within the platform and test results are provided. The code is available at https://github.com/leonucciarelli/gsproxy.git

Privacy-by-design GEN-RWD Sandbox for Distributed Multicentric Data Analysis in Healthcare: The Proxy Module

Tavazzi E.;
2025

Abstract

Modern data analysis techniques are transforming the healthcare landscape from both research and operational perspectives. Applying statistical analysis, machine learning, data science, and process mining to large databases enables data analysts and clinicians to enhance the quality and precision of their research and care efforts. However, the ingestion and maintenance of high-quality datasets present significant challenges, primarily due to the resource-intensive nature of the task and the difficulty of collecting comprehensive and statistically significant information sets. Multicentric cross-organizational studies, which involve the analysis of datasets collected from various independent data nodes, address this issue; however, concerns regarding privacy and data ownership often hinder the free exchange of data. Among the existing technologies, Distributed Analytics and Federated Learning offer promising solutions to these challenges by facilitating the analysis of large, decentralized datasets while safeguarding patient privacy. In this paper, we present and release as open-source the code of the Proxy Module within the GEN-RWD Sandbox platform, an infrastructure designed for privacy-preserving distributed analytics in healthcare. The module implements essential infrastructural management functions to ensure privacy in a distributed learning environment. A detailed explanation of the module functioning within the platform and test results are provided. The code is available at https://github.com/leonucciarelli/gsproxy.git
2025
2025 IEEE 13th International Conference on Healthcare Informatics (ICHI)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3559478
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact