Mark-Jan Nederhof
University of St Andrews
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark-Jan Nederhof.
Archive | 2001
Mehryar Mohri; Mark-Jan Nederhof
We present an algorithm for approximating context-free languages with regular languages. The algorithm is based on a simple transformation that applies to any context-free grammar and guarantees that the result can be compiled into a finite automaton. The resulting grammar contains at most one new nonterminal for any nonterminal symbol of the input grammar. The result thus remains readable and if necessary modifiable. We extend the approximation algorithm to the case of weighted context-free grammars. We also report experiments with several grammars showing that the size of the minimal deterministic automata accepting the resulting approximations is of practical use for applications such as speech recognition.
finite state methods and natural language processing | 2000
Mark-Jan Nederhof
Several methods are discussed that construct a finite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more refined form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are filtered through a finite automaton.
Natural Language Engineering | 1999
Gertjan van Noord; Gosse Bouma; Rob Koeling; Mark-Jan Nederhof
We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.
Computational Linguistics | 2003
Mark-Jan Nederhof
We discuss weighted deductive parsing and consider the problem of finding the derivation with the lowest weight. We show that Knuths generalization of Dijkstras algorithm for the shortest-path problem offers a general method to solve this problem. Our approach is modular in the sense that Knuths algorithm is formulated independently from the weighted deduction system.We discuss weighted deductive parsing and consider the problem of finding the derivation with the lowest weight. We show that Knuths generalization of Dijkstras algorithm for the shortest-path pr...
international workshop/conference on parsing technologies | 2000
Mark-Jan Nederhof
We show that for each context-free grammar a new grammar can be constructed that generates a regular language. This construction differs from some existing methods of approximation in that use of a pushdown automaton is avoided. This allows better insight into how the generated language is affected.
Journal of Artificial Intelligence Research | 2004
Mark-Jan Nederhof; Giorgio Satta
We propose a formalism for representation of finite languages, referred to as the class of IDL-expressions, which combines concepts that were only considered in isolation in existing formalisms. The suggested applications are in natural language processing, more specifically in surface natural language generation and in machine translation, where a sentence is obtained by first generating a large set of candidate sentences, represented in a compact way, and then filtering such a set through a parser. We study several formal properties of IDL-expressions and compare this new formalism with more standard ones. We also present a novel parsing algorithm for IDL-expressions and prove a non-trivial upper bound on its time complexity.
Archive | 2000
Bernd Kiefer; Hans-Ulrich Krieger; Mark-Jan Nederhof
This paper describes the successful metamorphosis of PAGE from a string-based grammar development system to an efficient run time system, operating on word hypotheses graphs (WHGs). In particular, we report on the techniques we have applied to PAGE and which have resulted into a speed-up in parsing time of more than an order of magnitude. We elaborate how the system is interfaced to other components; WHG search, prosody detector, and robust semantic processing. We also present measurements for string and WHG parsing. The system as described in the paper has been applied in the speech translation project Verbmobil with large HPSG grammars for English, German, and Japanese.
Computational Linguistics | 2005
Mark-Jan Nederhof
We show that under certain conditions, a language model can be trained on the basis of a second language model. The main instance of the technique trains a finite automaton on the basis of a probabilistic context-free grammar, such that the Kullback-Leibler distance between grammar and trained automaton is provably minimal. This is a substantial generalization of an existing algorithm to train an n-gram model on the basis of a probabilistic context-free grammar.
conference of the european chapter of the association for computational linguistics | 1993
Mark-Jan Nederhof
We show how techniques known from generalized LR parsing can be applied to left-corner parsing. The resulting parsing algorithm for context-free grammars has some advantages over generalized LR parsing: the sizes and generation times of the parsers are smaller, the produced output is more compact, and the basic parsing technique can more easily be adapted to arbitary context-free grammars.The algorithm can be seen as an optimization of algorithms known from existing literature. A strong advantage of our presentation is that it makes explicit the role of left-corner parsing in these algorithms.
meeting of the association for computational linguistics | 1994
Mark-Jan Nederhof
In this paper we relate a number of parsing algorithms which have been developed in very different areas of parsing theory, and which include deterministic algorithms, tabular algorithms, and a parallel algorithm. We show that these algorithms are based on the same underlying ideas.By relating existing ideas, we hope to provide an opportunity to improve some algorithms based on features of others. A second purpose of this paper is to answer a question which has come up in the area of tabular parsing, namely how to obtain a parsing algorithm with the property that the table will contain as little entries as possible, but without the possibility that two entries represent the same subderivation.