Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou's pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 "artificial" features is created. The feature creation is performed by genetic programming combining one or more "original" features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the "original" features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.

Genetic programming for creating Chou's pseudoamino acid based features for submitochondria localization

NANNI, LORIS;
2008

Abstract

Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou's pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 "artificial" features is created. The feature creation is performed by genetic programming combining one or more "original" features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the "original" features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.
2008
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/157739
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 163
  • ???jsp.display-item.citation.isi??? 160
social impact