Large-scale parallel measurements of the expression of many thousands genes are now available with high-density array made with collections of cDNA fragments, or oligonucleotide corresponding to different transcripts. These technologies have been applied to cancer investigations since the availability of such a large number of markers makes DNA array a powerful diagnostic tool for tumour and patient classification. Over the last two years, a series of computational tools have been developed for the analysis of different aspects of gene profiling. Our work tries to compare a series of supervised statistical techniques on the basis of their ability to correctly classify different types of tumours. A simulation approach was initially used to control the huge source of variation among and between patients, and to evaluate the ability of algorithms to classify tumours in relation to different types of experimental variables. Different techniques for reduction of data dimension were then added to the discriminant analysis and compared according to their ability to capture the main genetic information. The simulation results have been tested by applying the selected classification algorithms to two experimental microarray datasets of human cancers, and by measuring the correspondent rates of misclassification. Our analyses identify in these datasets a series of genes principally involved in tumour characterization. The functional role of these discriminant transcripts is discussed.
Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification
ROMUALDI, CHIARA;CAMPANARO, STEFANO;CAMPAGNA, DAVIDE;CELEGATO, BARBARA;TOPPO, STEFANO;VALLE, GIORGIO;LANFRANCHI, GEROLAMO
2003
Abstract
Large-scale parallel measurements of the expression of many thousands genes are now available with high-density array made with collections of cDNA fragments, or oligonucleotide corresponding to different transcripts. These technologies have been applied to cancer investigations since the availability of such a large number of markers makes DNA array a powerful diagnostic tool for tumour and patient classification. Over the last two years, a series of computational tools have been developed for the analysis of different aspects of gene profiling. Our work tries to compare a series of supervised statistical techniques on the basis of their ability to correctly classify different types of tumours. A simulation approach was initially used to control the huge source of variation among and between patients, and to evaluate the ability of algorithms to classify tumours in relation to different types of experimental variables. Different techniques for reduction of data dimension were then added to the discriminant analysis and compared according to their ability to capture the main genetic information. The simulation results have been tested by applying the selected classification algorithms to two experimental microarray datasets of human cancers, and by measuring the correspondent rates of misclassification. Our analyses identify in these datasets a series of genes principally involved in tumour characterization. The functional role of these discriminant transcripts is discussed.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.