Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.
Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time.
APOSTOLICO, ALBERTO;
2008
Abstract
Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.| File | Dimensione | Formato | |
|---|---|---|---|
|
at-incr2008.pdf
Accesso riservato
Tipologia:
Published (Publisher's Version of Record)
Licenza:
Accesso privato - non pubblico
Dimensione
608.76 kB
Formato
Adobe PDF
|
608.76 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




