Proteins are complex systems as the Anfinsen’s thermodynamic hypothesis highlights. Their complexity hampers an analytical solution to the protein folding problem, so to say how to predict the tertiary structure of the proteins from the residue sequence or primary structure. However nature presently provides us with over 40000 protein structures known with atomic resolution suggesting that possibly heuristic methods may help us in unraveling protein folding. Indeed several methods have been described to address this point: how to generalize from known examples of folding predictive methods suited to compute the three-dimensional protein structure. This is stressed to the point the automatic servers nowadays may generate 3D structures as soon as genomes are sequenced contributing several millions of putative structures that however need to be validated in most of the cases. The question then arises: how to select good structures among many different models generated by different methods starting from the same protein sequence? In protein structure prediction potential energy functions are adopted both to compute “ab initio” models and to score threading candidates. It is however known that small intrinsic errors in these models can lead to a high number of erroneous structures endowed with values of conformational energy lower than the native one (Filkenstein 1997). Decreasing the number of false candidates among the best scoring structures is therefore a major goal in protein structure prediction, and several types of energy functions and decoy sets have been described in order to address this issue (Park et al., 1997; Felts et al., 2002; Hardin et al., 2002; Tsai et al., 2003). The notion that the protein native structure is endowed with the lowest free energy with respect to all the other conformations may help us to develop a filter procedure. A very stringent test for reliability is when the adopted energy function is able to compute lower conformational energies for native and near native conformations with respect to non-native ones. To test the capabilities of new scoring functions, sets of “good decoys” were developed (Tsai et al. (2003)). These sets routinely contain protein-like structures for a wide variety of different structural domains to avoid over-fitting; including conformations close to the native structure (RMSD<4Å). A method that performs well on a decoy set is then adopted as a reliable filter to select good predicted models among a set of computed structures. In this report we adopt a graph based representation of the protein structure: each node in the graph is a Calpha atom and each vertex (or edge) represents a contact among Calpha atoms. This representation only carries information among the local and non-local interactions that within the protein world stabilizes the protein structure. By this we introduce a new approach based on graph theory to sort out near-native decoys from a plethora of protein-like decoys. Given the data set of protein and decoy structures, we consider five measures defined in graph theory and we test their ability to distinguish between correct and incorrect folds. Our method stems from the notion that decoys can be ranked with respect to the corresponding protein structure as a function of structural similarity. In our approach each protein and related decoys are represented with a graph adjacency matrix or “contact map”. For each decoy set (or set of decoy contact maps per each protein) we compute the average degree (average number of contacts per residue), the contact order, the normalized complexity, the network flow, and its weighted version. The ability of a given graph measure to act as a scoring function is then evaluated by computing the Enrichment and Z score (Tsai et al., 2003). We obtain that with these measures the selection of optimal decoys performs similarly to previously described methods based on centroid/backbone energy functions (Tsai et al., 2...

The Graph Theory and the Protein Universe: selecting the close-to-the-native structures within a model decoy set

Fariselli, Piero;
2008

Abstract

Proteins are complex systems as the Anfinsen’s thermodynamic hypothesis highlights. Their complexity hampers an analytical solution to the protein folding problem, so to say how to predict the tertiary structure of the proteins from the residue sequence or primary structure. However nature presently provides us with over 40000 protein structures known with atomic resolution suggesting that possibly heuristic methods may help us in unraveling protein folding. Indeed several methods have been described to address this point: how to generalize from known examples of folding predictive methods suited to compute the three-dimensional protein structure. This is stressed to the point the automatic servers nowadays may generate 3D structures as soon as genomes are sequenced contributing several millions of putative structures that however need to be validated in most of the cases. The question then arises: how to select good structures among many different models generated by different methods starting from the same protein sequence? In protein structure prediction potential energy functions are adopted both to compute “ab initio” models and to score threading candidates. It is however known that small intrinsic errors in these models can lead to a high number of erroneous structures endowed with values of conformational energy lower than the native one (Filkenstein 1997). Decreasing the number of false candidates among the best scoring structures is therefore a major goal in protein structure prediction, and several types of energy functions and decoy sets have been described in order to address this issue (Park et al., 1997; Felts et al., 2002; Hardin et al., 2002; Tsai et al., 2003). The notion that the protein native structure is endowed with the lowest free energy with respect to all the other conformations may help us to develop a filter procedure. A very stringent test for reliability is when the adopted energy function is able to compute lower conformational energies for native and near native conformations with respect to non-native ones. To test the capabilities of new scoring functions, sets of “good decoys” were developed (Tsai et al. (2003)). These sets routinely contain protein-like structures for a wide variety of different structural domains to avoid over-fitting; including conformations close to the native structure (RMSD<4Å). A method that performs well on a decoy set is then adopted as a reliable filter to select good predicted models among a set of computed structures. In this report we adopt a graph based representation of the protein structure: each node in the graph is a Calpha atom and each vertex (or edge) represents a contact among Calpha atoms. This representation only carries information among the local and non-local interactions that within the protein world stabilizes the protein structure. By this we introduce a new approach based on graph theory to sort out near-native decoys from a plethora of protein-like decoys. Given the data set of protein and decoy structures, we consider five measures defined in graph theory and we test their ability to distinguish between correct and incorrect folds. Our method stems from the notion that decoys can be ranked with respect to the corresponding protein structure as a function of structural similarity. In our approach each protein and related decoys are represented with a graph adjacency matrix or “contact map”. For each decoy set (or set of decoy contact maps per each protein) we compute the average degree (average number of contacts per residue), the contact order, the normalized complexity, the network flow, and its weighted version. The ability of a given graph measure to act as a scoring function is then evaluated by computing the Enrichment and Z score (Tsai et al., 2003). We obtain that with these measures the selection of optimal decoys performs similarly to previously described methods based on centroid/backbone energy functions (Tsai et al., 2...
2008
Biocomplexity At The Cutting Edge Of Physics, Systems Biology And Humanities
978-88-7395-330-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3184079
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact