High-throughput technologies allow to produce rapidly huge amount of gene expression data, useful to characterize wide variety of phenotypes. However, the choice of the best methods and approaches to analyze data from an high-throughput study is not a trivial aspect and makes the biological interpretation of the results a challenging task. The current analysis pipeline rst selects a set of genes which are somehow signicant for a specic aspect of the available data and then provides a functional characterization of the results using the many public sources of prior biological knowledge. This approach, while successful, may not be able to obtain the best possible results because, for example, it could discard some functions or processes, known to be relevant for the specic biological question under analysis, just based on the low number of genes in the identied list. We present a new analysis framework, Knowledge Driven Variable Selection , that integrates prior knowledge on data analysis. The expression data matrix is partitioned according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, dierently from the current analysis pipeline, doesn't exclude a priori any function or process potentially relevant for the biological question under investigation. Three case studies have been presented to demonstrate the performance of the method.

Discriminant functional gene groups identification with machine learning and prior knowledge

SANAVIA, TIZIANA;DI CAMILLO, BARBARA
2012

Abstract

High-throughput technologies allow to produce rapidly huge amount of gene expression data, useful to characterize wide variety of phenotypes. However, the choice of the best methods and approaches to analyze data from an high-throughput study is not a trivial aspect and makes the biological interpretation of the results a challenging task. The current analysis pipeline rst selects a set of genes which are somehow signicant for a specic aspect of the available data and then provides a functional characterization of the results using the many public sources of prior biological knowledge. This approach, while successful, may not be able to obtain the best possible results because, for example, it could discard some functions or processes, known to be relevant for the specic biological question under analysis, just based on the low number of genes in the identied list. We present a new analysis framework, Knowledge Driven Variable Selection , that integrates prior knowledge on data analysis. The expression data matrix is partitioned according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, dierently from the current analysis pipeline, doesn't exclude a priori any function or process potentially relevant for the biological question under investigation. Three case studies have been presented to demonstrate the performance of the method.
2012
ESANN 2012 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
978-2-87419-049-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2527904
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact