The exponential growth of microbial sequence data has created a bottleneck in biology: the sequence-to-function gap, where the ability to generate data outpaces the capacity to interpret the functional roles of genes, organisms, and communities. This thesis addresses this challenge through a multi-scale investigation, demonstrating how diverse machine learning models and strategies can be deployed to infer microbial function across increasing levels of biological complexity. The investigation begins at the molecular level, where a benchmark of supervised classification models and sequence encodings was performed to develop a pipeline, CICERON, for accurately predicting the function of bioactive peptides. The approach was then scaled to the organismal level with MICROPHERRET, a novel tool that leverages genomic features to successfully predict 86 distinct metabolic and ecological phenotypes for both complete and metagenome-assembled genomes. Finally, the methodology was advanced from classification to optimization. By integrating a genetic algorithm with genome-scale metabolic models, a tool was developed to computationally engineer the composition of a microbial community for maximizing compound production. Collectively, this work establishes a multi-scale computational framework for functional microbiology. It demonstrates that by strategically matching machine learning to the biological question at hand, from classification to evolutionary optimization, it is possible to bridge the sequence-to-function gap from the level of individual molecules to the rational engineering of entire microbial ecosystems.
Harnessing Machine Learning to investigate the function of bioactive proteins and peptides in microbial communities / Bizzotto, E.. - (2026 Mar 12).
Harnessing Machine Learning to investigate the function of bioactive proteins and peptides in microbial communities
BIZZOTTO, EDOARDO
2026
Abstract
The exponential growth of microbial sequence data has created a bottleneck in biology: the sequence-to-function gap, where the ability to generate data outpaces the capacity to interpret the functional roles of genes, organisms, and communities. This thesis addresses this challenge through a multi-scale investigation, demonstrating how diverse machine learning models and strategies can be deployed to infer microbial function across increasing levels of biological complexity. The investigation begins at the molecular level, where a benchmark of supervised classification models and sequence encodings was performed to develop a pipeline, CICERON, for accurately predicting the function of bioactive peptides. The approach was then scaled to the organismal level with MICROPHERRET, a novel tool that leverages genomic features to successfully predict 86 distinct metabolic and ecological phenotypes for both complete and metagenome-assembled genomes. Finally, the methodology was advanced from classification to optimization. By integrating a genetic algorithm with genome-scale metabolic models, a tool was developed to computationally engineer the composition of a microbial community for maximizing compound production. Collectively, this work establishes a multi-scale computational framework for functional microbiology. It demonstrates that by strategically matching machine learning to the biological question at hand, from classification to evolutionary optimization, it is possible to bridge the sequence-to-function gap from the level of individual molecules to the rational engineering of entire microbial ecosystems.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_Edoardo_Bizzotto_final (1).pdf
embargo fino al 11/03/2029
Descrizione: Tesi_Edoardo_Bizzotto_final
Tipologia:
Tesi di dottorato
Dimensione
5.22 MB
Formato
Adobe PDF
|
5.22 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




