Clinical stratification improves the diagnostic accuracy of small omics datasets within machine learning and genome-scale metabolic modelling methods

Magazzù, Giuseppe; Zampieri, Guido; Angione, Claudio

doi:10.1016/j.compbiomed.2022.106244

Background: Recently, multi-omic machine learning architectures have been proposed for the early detection of cancer. However, for rare cancers and their associated small datasets, it is still unclear how to use the available multi-omics data to achieve a mechanistic prediction of cancer onset and progression, due to the limited data available. Hepatoblastoma is the most frequent liver cancer in infancy and childhood, and whose incidence has been lately increasing in several developed countries. Even though some studies have been conducted to understand the causes of its onset and discover potential biomarkers, the role of metabolic rewiring has not been investigated in depth so far.Methods: Here, we propose and implement an interpretable multi-omics pipeline that combines mechanis-tic knowledge from genome-scale metabolic models with machine learning algorithms, and we use it to characterise the underlying mechanisms controlling hepatoblastoma.Results and Conclusions: While the obtained machine learning models generally present a high diagnostic classification accuracy, our results show that the type of omics combinations used as input to the machine learning models strongly affects the detection of important genes, reactions and metabolic pathways linked to hepatoblastoma. Our method also suggests that, in the context of computer-aided diagnosis of cancer, optimal diagnostic accuracy can be achieved by adopting a combination of omics that depends on the patient's clinical characteristics.