Biomedical data management is increasingly complex due to the variety of storage systems and evolving data models. This heterogeneity presents obstacles to data integration and querying, crucial for advancing biomedical research and healthcare. A possible solution is to employ a recent idea in the database field: polystores. A polystore is a DBMS designed to integrate and manage multiple heterogeneous data stores, allowing for efficient data processing and querying across diverse data models and storage systems. Nevertheless, polystore systems may differ profoundly one with the other, both in their structure and in the interface provided to the users: this is usually due to the diverse landscapes where polystores may be applied, and thus the diverse nature of data available in different fields, and to the information needs that users may have. These limitations impede the adoption of existing polystores in the context of biomedical data. Moreover, as far of our knowledge, there exist no system offering an integrated viewpoint to biomedical data by means of a graph data model, which would instead provide a sharpened representation of this domain. In this paper, we outline the research challenges and the initial steps towards the development of a polystore system that provides efficient access to multiple heterogeneous biomedical data sources, addressing also critical privacy concerns by tracing data flow and ensuring the privacy and anonymity of individuals.
Design and Development of a Polystore System for Heterogeneous Biomedical Data
Mirco Cazzaro
Writing – Original Draft Preparation
2025
Abstract
Biomedical data management is increasingly complex due to the variety of storage systems and evolving data models. This heterogeneity presents obstacles to data integration and querying, crucial for advancing biomedical research and healthcare. A possible solution is to employ a recent idea in the database field: polystores. A polystore is a DBMS designed to integrate and manage multiple heterogeneous data stores, allowing for efficient data processing and querying across diverse data models and storage systems. Nevertheless, polystore systems may differ profoundly one with the other, both in their structure and in the interface provided to the users: this is usually due to the diverse landscapes where polystores may be applied, and thus the diverse nature of data available in different fields, and to the information needs that users may have. These limitations impede the adoption of existing polystores in the context of biomedical data. Moreover, as far of our knowledge, there exist no system offering an integrated viewpoint to biomedical data by means of a graph data model, which would instead provide a sharpened representation of this domain. In this paper, we outline the research challenges and the initial steps towards the development of a polystore system that provides efficient access to multiple heterogeneous biomedical data sources, addressing also critical privacy concerns by tracing data flow and ensuring the privacy and anonymity of individuals.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




