In silico structural study of random amino acid sequence proteins not present in nature

Prymula, K.; Piwowar, M.; Kochanczyk, M.; Flis, L.; Malawski, M.; Szepieniec, T.; Evangelista, G.; Minervini, G.; Polticelli, F.; Wisniowski, Z.; Salapa, K.; Matczynska, E.; Roterman, I.

doi:10.1002/cbdv.200800338

The three-dimensional structures of a set of 'never born proteins' (NBP, random amino acid sequence proteins with no significant homology with known proteins) were predicted using two methods: Rosetta and the one based on the 'fuzzy-oil-drop' (FOD) model. More than 3000 different random amino acid sequences have been generated, filtered against the non redundant protein sequence data base, to remove sequences with significant homology with known proteins, and subjected to three-dimensional structure prediction. Comparison between Rosetta and FOD predictions allowed to select the ten top (highest structural similarity) and the ten bottom (the lowest structural similarity) structures from the ranking list organized according to the RMS-D value. The selected structures were taken for detailed analysis to define the scale of structural accordance and discrepancy between the two methods. The structural similarity measurements revealed discrepancies between structures generated on the basis of the two methods. Their potential biological function appeared to be quite different as well. The ten bottom structures appeared to be 'unfoldable' for the FOD model. Some aspects of the general characteristics of the NBPs are also discussed. The calculations were performed on the EUChinaGRID grid platform to test the performance of this infrastructure for massive protein structure predictions. © 2009 Verlag Helvetica Chimica Acta AG.