Motivation: Intrinsic disorder (ID), i.e. the lack of a unique folded conformation at physiological conditions, is a common feature for many proteins, which requires specialized biochemical experiments that are not high-throughput. Missing X-ray residues from the PDB have been widely used as a proxy for ID when developing computational methods. This may lead to a systematic bias, where predictors deviate from biologically relevant ID. Large benchmarking sets on experimentally validated ID are scarce. Recently, the DisProt database has been renewed and expanded to include manually curated ID annotations for several hundred new proteins. This provides a large benchmark set which has not yet been used for training ID predictors.Results: Here, we describe the first systematic benchmarking of ID predictors on the new DisProt dataset. In contrast to previous assessments based onmissing X-ray data, this dataset contains mostly long ID regions and a significant amount of fully ID proteins. The benchmarking shows that ID predictors work quite well on the new dataset, especially for long ID segments. However, a large fraction of ID still goes virtually undetected and the ranking of methods is different than for PDB data. In particular, many predictors appear to confound ID and regions outside X-ray structures. This suggests that the ID prediction methods capture different flavors of disorder and can benefit from highly accurate curated examples.Availability and implementation: The raw data used for the evaluation are available from URL: http://www.disprot.org/assessment/.Contact: silvio.tosatto@unipd.itSupplementary information: Supplementary data are available at Bioinformatics online.

A comprehensive assessment of long intrinsic protein disorder from the DisProt database

Necci, Marco;Piovesan, Damiano;Dosztányi, Zsuzsanna;Tompa, Peter;Tosatto, Silvio C. E.
2018

Abstract

Motivation: Intrinsic disorder (ID), i.e. the lack of a unique folded conformation at physiological conditions, is a common feature for many proteins, which requires specialized biochemical experiments that are not high-throughput. Missing X-ray residues from the PDB have been widely used as a proxy for ID when developing computational methods. This may lead to a systematic bias, where predictors deviate from biologically relevant ID. Large benchmarking sets on experimentally validated ID are scarce. Recently, the DisProt database has been renewed and expanded to include manually curated ID annotations for several hundred new proteins. This provides a large benchmark set which has not yet been used for training ID predictors.Results: Here, we describe the first systematic benchmarking of ID predictors on the new DisProt dataset. In contrast to previous assessments based onmissing X-ray data, this dataset contains mostly long ID regions and a significant amount of fully ID proteins. The benchmarking shows that ID predictors work quite well on the new dataset, especially for long ID segments. However, a large fraction of ID still goes virtually undetected and the ranking of methods is different than for PDB data. In particular, many predictors appear to confound ID and regions outside X-ray structures. This suggests that the ID prediction methods capture different flavors of disorder and can benefit from highly accurate curated examples.Availability and implementation: The raw data used for the evaluation are available from URL: http://www.disprot.org/assessment/.Contact: silvio.tosatto@unipd.itSupplementary information: Supplementary data are available at Bioinformatics online.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3257641
Citazioni
  • ???jsp.display-item.citation.pmc??? 21
  • Scopus 40
  • ???jsp.display-item.citation.isi??? 37
social impact