Given a text string x of n symbols and an integer constant d, we consider the problem of finding, for any pair (y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x. Although in principle there might be O(n^4) distinct subword pairs in x, it is seen that it suffices to consider a family of only O(n^2) such pairs, with the property that for any neglected pair (y',z') there exists a corresponding pair (y,z) contained in our family such that: (i) y' is a prefix of y and z' is a prefix of z; and (ii) the tandem index of (y',z') equals that of (y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.

Discovering subword associations in strings in time linear in the output size.

APOSTOLICO, ALBERTO;SATTA, GIORGIO
2009

Abstract

Given a text string x of n symbols and an integer constant d, we consider the problem of finding, for any pair (y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x. Although in principle there might be O(n^4) distinct subword pairs in x, it is seen that it suffices to consider a family of only O(n^2) such pairs, with the property that for any neglected pair (y',z') there exists a corresponding pair (y,z) contained in our family such that: (i) y' is a prefix of y and z' is a prefix of z; and (ii) the tandem index of (y',z') equals that of (y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2438569
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact