Karyotyping, or the automatic classification of human chromosomes, is mostly based on the analysis of the chromosome specific banding pattern. Unfortunately, the most informative phases of the cell division cycle are composed of long chromosomes that easily overlap: the involved banding pattern information is corrupted, resulting in a drastic increase of the classification error. Assuming the availability of a probabilistic classifier, the improvement of the classification of chromosomes with corrupted data would require the additional estimation of the joint probability density of the observed and missing data for each chromosome class. Given the number of classes, the possible position and extension of the corrupted data within a chromosome, and the dimensionality of the feature space, a reliable estimation would need an impossible number of training samples. We chose to circumvent the estimation problem by developing a statistical generative model of the pattern of each class, so that the corrupted part can be substituted with a partial pattern synthetically generated from the model. This allows to obtain a Monte Carlo estimate of the maximum a posteriori probability for the class given the observation and the missing data, which reduces to a simple voting scheme if the a priori probability for each class is equal. Moreover, this Monte Carlo classification is superior to the voting scheme based on the simple imputation of the classes mean to the missing data.

An improved classification scheme for chromosomes with missing data

POLETTI, ENEA;RUGGERI, ALFREDO;GRISAN, ENRICO
2011

Abstract

Karyotyping, or the automatic classification of human chromosomes, is mostly based on the analysis of the chromosome specific banding pattern. Unfortunately, the most informative phases of the cell division cycle are composed of long chromosomes that easily overlap: the involved banding pattern information is corrupted, resulting in a drastic increase of the classification error. Assuming the availability of a probabilistic classifier, the improvement of the classification of chromosomes with corrupted data would require the additional estimation of the joint probability density of the observed and missing data for each chromosome class. Given the number of classes, the possible position and extension of the corrupted data within a chromosome, and the dimensionality of the feature space, a reliable estimation would need an impossible number of training samples. We chose to circumvent the estimation problem by developing a statistical generative model of the pattern of each class, so that the corrupted part can be substituted with a partial pattern synthetically generated from the model. This allows to obtain a Monte Carlo estimate of the maximum a posteriori probability for the class given the observation and the missing data, which reduces to a simple voting scheme if the a priori probability for each class is equal. Moreover, this Monte Carlo classification is superior to the voting scheme based on the simple imputation of the classes mean to the missing data.
33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'11)
9781424441211
9781424441228
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11577/180743
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 4
social impact