Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho–Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.

Fast Search Algorithms for Position Specific Scoring Matrices

PIZZI, CINZIA;
2007

Abstract

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho–Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.
2007
Proceedings of the 1st International Conference on Bioinformatics Research and Development (BIRD 2007)
9783540712329
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/179711
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 11
social impact