Abstract

In this paper, we investigate the use of selectional restriction -- the constraints a predicate imposes on its arguments -- in a language model for speech recognition. We use an un-tagged corpus, followed by a public domain tagger and a very simple finite state machine to obtain verb-object pairs from unrestricted English text. We then measure the impact the knowledge of the verb has on the prediction of the direct object in terms of the perplexity of a cluster-based language model. The results show that even though a clustered bigram is more useful than a verb-object model, the combination of the two leads to an improvement over the clustered bigram model.

Full PDF