Many approaches have been proposed for protein structure analysis and comparison. In this chapter the importance of simple geometric invariant features of secondary structures in proteins will be discussed. The method described is based on invariant geometric features of proteins’ secondary structures, neglecting any information about their connection along the polypeptide chain. The use of these geometric invariant properties enables a comprehensive analysis of protein patterns and a fast search of similar proteins and substructures. In brief, a globular protein is reduced to a collection of oriented segments in the 3D space, each of them corresponding to a secondary structure element (SSE). In order to make this representation invariant to rigid transformations, angles and distances between pairs of segments are considered. This information is assembled in triplets or quartets of SSEs and indexed into a hash table that allows fast retrieval of similarity information for substructures and proteins. Using the hash table, a method for protein structure comparison that combines indexing and dynamic programming (DP) is presented. The computational complexity of the comparison is moved to the initial building of the hash table, whilst subsequent calculations are very fast. An extensive experimentation shows that this approach achieves results of quality comparable to that of other existing approaches, but is generally faster even when searching the entire Protein Data Bank. We also discuss an efficient method to compute over-represented patterns of quartets of SSEs by assembling frequent patterns of triplets of segments. The method for discovering recurrent patterns is based on the APriori algorithm and extends naturally to any number of SSEs. The detection of over-represented and under-represented substructures may be taken as indication of physical possibility or impossibility of artificial structures.

Invariant geometric properties of secondary structure elements in proteins

COMIN, MATTEO;GUERRA, CONCETTINA;ZANOTTI, GIUSEPPE
2010

Abstract

Many approaches have been proposed for protein structure analysis and comparison. In this chapter the importance of simple geometric invariant features of secondary structures in proteins will be discussed. The method described is based on invariant geometric features of proteins’ secondary structures, neglecting any information about their connection along the polypeptide chain. The use of these geometric invariant properties enables a comprehensive analysis of protein patterns and a fast search of similar proteins and substructures. In brief, a globular protein is reduced to a collection of oriented segments in the 3D space, each of them corresponding to a secondary structure element (SSE). In order to make this representation invariant to rigid transformations, angles and distances between pairs of segments are considered. This information is assembled in triplets or quartets of SSEs and indexed into a hash table that allows fast retrieval of similarity information for substructures and proteins. Using the hash table, a method for protein structure comparison that combines indexing and dynamic programming (DP) is presented. The computational complexity of the comparison is moved to the initial building of the hash table, whilst subsequent calculations are very fast. An extensive experimentation shows that this approach achieves results of quality comparable to that of other existing approaches, but is generally faster even when searching the entire Protein Data Bank. We also discuss an efficient method to compute over-represented patterns of quartets of SSEs by assembling frequent patterns of triplets of segments. The method for discovering recurrent patterns is based on the APriori algorithm and extends naturally to any number of SSEs. The detection of over-represented and under-represented substructures may be taken as indication of physical possibility or impossibility of artificial structures.
2010
Biological data mining
9781420086843
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2422245
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact