Background: Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results: We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion: We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that RHCRFs can be useful tools for biosequence analysis applications. Availability: GRHCRF software is available under GPLv3 licence at he website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.

Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications

Fariselli, Piero;
2009

Abstract

Background: Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results: We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion: We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that RHCRFs can be useful tools for biosequence analysis applications. Availability: GRHCRF software is available under GPLv3 licence at he website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3184055
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 20
social impact