Grid Deployment of Bioinformatics Applications: a Case Study in Protein Similarity Determination

Comin, Matteo; Ferrari, Carlo; Guerra, Concettina

doi:10.1142/S0129626404001817

In this paper we present a scenario for the grid immersion of the procedures that solve the protein structural similarity determination problem. The emphasis is on the way various computational components and data resources are tied together into a workflow to be executed on a grid. The grid deployment has been organized according to the bag-of-service model: a set of different modules (with their data set) is made available to the application designers. Each module deals with a specific subproblem using a proper protein data representation. At the design level, the process of task selection produces a first general workflow that establishes which subproblems need to be solved and their temporal relations. A further refinement requires to select a procedure for each previously identified task that solves it: the choice is made among different available methods and representations. The final outcome is an instance of the workflow ready for execution on a grid. Our approach to protein structure comparison is based on a combination of indexing and dynamic programming techniques to achieve fast and reliable matching. All the components have been implemented on a grid infrastructure using Globus, and the overall tool has been tested by choosing proteins from different fold classes. The obtained results are compared against SCOP, a standard tool for the classification of known proteins.