The string correction factor is the term by which the probability of a word w needs to be multiplied in order to account for character changes or “errors” occurring in at most k arbitrary positions in that word. The behavior of this factor, as a function of k and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi- tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over- represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under i.i.d. probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open.

On the Monotonicity of the String Correction Factor for Words with Mismatches

PIZZI, CINZIA
2006

Abstract

The string correction factor is the term by which the probability of a word w needs to be multiplied in order to account for character changes or “errors” occurring in at most k arbitrary positions in that word. The behavior of this factor, as a function of k and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi- tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over- represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under i.i.d. probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open.
2006
Dagstuhl Seminar Proceedings 06201: "Combinatorial and Algorithmic Foundations of Pattern and Association Discovery"
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/179631
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact