Prefix Probabilities from Stochastic Tree Adjoining Grammars

Nederhof, M. J.; Sarkar, A.; Satta, Giorgio

doi:10.3115/980432.980726

Language models for speech recognition typically use a probability model of the form Pr(an/a1, a2, .... an-1 Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability ∑wεσ* Pr(a1 ...anw), where w represents all possible terminations of the prefix a1 ... an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n 6) time. The probability of sub-derivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language modelling.