Modern data analysis techniques are transforming the healthcare landscape from both research and operational perspectives. Applying statistical analysis, machine learning, data science, and process mining to large databases enables data analysts and clinicians to enhance the quality and precision of their research and care efforts. However, the ingestion and maintenance of high-quality datasets present significant challenges, primarily due to the resource-intensive nature of the task and the difficulty of collecting comprehensive and statistically significant information sets. Multicentric cross-organizational studies, which involve the analysis of datasets collected from various independent data nodes, address this issue; however, concerns regarding privacy and data ownership often hinder the free exchange of data. Among the existing technologies, Distributed Analytics and Federated Learning offer promising solutions to these challenges by facilitating the analysis of large, decentralized datasets while safeguarding patient privacy. In this paper, we present and release as open-source the code of the Proxy Module within the GEN-RWD Sandbox platform, an infrastructure designed for privacy-preserving distributed analytics in healthcare. The module implements essential infrastructural management functions to ensure privacy in a distributed learning environment. A detailed explanation of the module functioning within the platform and test results are provided. The code is available at https://github.com/leonucciarelli/gsproxy.git
Privacy-by-design GEN-RWD Sandbox for Distributed Multicentric Data Analysis in Healthcare: The Proxy Module
Tavazzi E.;
2025
Abstract
Modern data analysis techniques are transforming the healthcare landscape from both research and operational perspectives. Applying statistical analysis, machine learning, data science, and process mining to large databases enables data analysts and clinicians to enhance the quality and precision of their research and care efforts. However, the ingestion and maintenance of high-quality datasets present significant challenges, primarily due to the resource-intensive nature of the task and the difficulty of collecting comprehensive and statistically significant information sets. Multicentric cross-organizational studies, which involve the analysis of datasets collected from various independent data nodes, address this issue; however, concerns regarding privacy and data ownership often hinder the free exchange of data. Among the existing technologies, Distributed Analytics and Federated Learning offer promising solutions to these challenges by facilitating the analysis of large, decentralized datasets while safeguarding patient privacy. In this paper, we present and release as open-source the code of the Proxy Module within the GEN-RWD Sandbox platform, an infrastructure designed for privacy-preserving distributed analytics in healthcare. The module implements essential infrastructural management functions to ensure privacy in a distributed learning environment. A detailed explanation of the module functioning within the platform and test results are provided. The code is available at https://github.com/leonucciarelli/gsproxy.gitPubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




