Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time.

Apostolico, Alberto; Tagliacollo, Claudia

doi:10.1016/j.tcs.2008.08.002

Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.