Optimal Probabilistic Evaluation Functions for Search Controlled by Stochastic Context-Free Grammars

Corazza, A; DE MORI, R; Gretter, R; Satta, Giorgio

doi:10.1109/34.329008

The possibility of using stochastic context-free grammars (SCFG's) in language modeling (LM) has been considered previously. When these grammars are used, search can be directed by evaluation functions based on the probabilities that a SCFG generates a sentence, given only some words in it. Expressions for computing the evaluation function have been proposed by Jelinek and Lafferty (1991) for the recognition of word sequences in the case in which only the prefix of a sequence is known. Corazza et al. (1991) have proposed methods for probability computation in the more general case in which partial word sequences interleaved by gaps are known. This computation is too complex in practice unless the lengths of the gaps are known. This paper proposes a method for computing the probability of the best parse tree that can generate a sentence only part of which (consisting of islands and gaps) is known. This probability is the minimum possible, and thus the most informative, upper-bound that can be used in the evaluation function. The computation of the proposed upper-bound has cubic time complexity even if the lengths of the gaps are unknown. This makes possible the practical use of SCFG for driving interpretations of sentences in natural language processing