Given two strings T = a 1 : : : an and P = b 1 : : : b m over an alphabet \Sigma, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple O(n log j\Sigmaj) time preprocessing of T makes it easy to decide subsequently for any P and in at most jP j log j\Sigmaj character comparisons, whether P is a subsequence of T . These problems become more complicated if one asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P , it is meant that P is not a subsequence of any proper substring of Y . For every minimal substring Y , the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y . It is not difficult to realize such an automaton in time and space O(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(n log n), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried outin time O(n +P m i=1 rocc i \Delta i \Delta log n \Delta log j\Sigmaj), where rocc i is the number of distinct minimal substrings of T having b 1 : : : b i as a subsequence. All log factors appearing in the above boundscan be further reduced to log log by resort to known integer-handling data structures.
Compact Recognizers of Episode Sequences
APOSTOLICO, ALBERTO;
2002
Abstract
Given two strings T = a 1 : : : an and P = b 1 : : : b m over an alphabet \Sigma, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple O(n log j\Sigmaj) time preprocessing of T makes it easy to decide subsequently for any P and in at most jP j log j\Sigmaj character comparisons, whether P is a subsequence of T . These problems become more complicated if one asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P , it is meant that P is not a subsequence of any proper substring of Y . For every minimal substring Y , the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y . It is not difficult to realize such an automaton in time and space O(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(n log n), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried outin time O(n +P m i=1 rocc i \Delta i \Delta log n \Delta log j\Sigmaj), where rocc i is the number of distinct minimal substrings of T having b 1 : : : b i as a subsequence. All log factors appearing in the above boundscan be further reduced to log log by resort to known integer-handling data structures.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.