Rens Bod
University of Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rens Bod.
international conference on computational linguistics | 1992
Rens Bod
Data Oriented Parsing (DOP) is a model where no abstract rules, but language experiences in the form of an analyzed corpus, constitute the basis for language processing. Analyzing a new input means that the system attempts to find the most probable way to reconstruct the input out of fragments that already exist in the corpus. Disambiguation occurs as a side-effect DOP can be implemented by using conventional parsing strategies.
Psychological Science | 2011
Stefan L. Frank; Rens Bod
Although it is generally accepted that hierarchical phrase structures are instrumental in describing human language, their role in cognitive processing is still debated. We investigated the role of hierarchical structure in sentence processing by implementing a range of probabilistic language models, some of which depended on hierarchical structure, and others of which relied on sequential structure only. All models estimated the occurrence probabilities of syntactic categories in sentences for which reading-time data were available. Relating the models’ probability estimates to the data showed that the hierarchical-structure models did not account for variance in reading times over and above the amount of variance accounted for by all of the sequential-structure models. This suggests that a sentence’s hierarchical structure, unlike many other sources of information, does not noticeably affect the generation of expectations about upcoming words.
conference of the european chapter of the association for computational linguistics | 2003
Rens Bod
Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which outperforms each of them separately. Together with a PCFG-reduction of DOP we obtain improved accuracy and efficiency on the Wall Street Journal treebank. Our results show an 11% relative reduction in error rate over previous models, and an average processing time of 3.6 seconds per WSJ sentence.
Journal of New Music Research | 2002
Rens Bod
We argue for a memory-based approach to music analysis which works with concrete musical experiences rather than with rules or principles. New pieces of music are analyzed by combining fragments from structures of previously encountered pieces. The occurrence-frequencies of the fragments are used to determine the preferred analysis of a piece. We test some instances of this approach against a set of 1,000 manually annotated folksongs from the Essen Folksong Collection, yielding up to 85.9% phrase accuracy. A qualitative analysis of our results indicates that there are grouping phenomena that challenge the commonly accepted Gestalt principles of proximity, similarity and parallelism. These grouping phenomena can neither be explained by other musical factors, such as meter and harmony. We argue that music perception may be much more memory-based than previously assumed.
Topics in Cognitive Science | 2009
Gideon Borensztajn; Willem H. Zuidema; Rens Bod
We develop an approach to automatically identify the most probable multiword constructions used in childrens utterances, given syntactically annotated utterances from the Brown corpus of CHILDES. The found constructions cover many interesting linguistic phenomena from the language acquisition literature and show a progression from very concrete toward abstract constructions. We show quantitatively that for all children of the Brown corpus grammatical abstraction, defined as the relative number of variable slots in the productive units of their grammar, increases globally with age.
conference of the european chapter of the association for computational linguistics | 1993
Rens Bod
In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrees. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bod, 1992a) provides a formalization of the theory. In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses. We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.
meeting of the association for computational linguistics | 2001
Rens Bod
We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank (a precision of 90.8% and a recall of 90.6%). We isolate some dependency relations which previous models neglect but which contribute to higher parse accuracy.
meeting of the association for computational linguistics | 1998
Rens Bod; Ronald M. Kaplan
We develop a Data-Oriented Parsing (DOP) model based on the syntactic representations of Lexical-Functional Grammar (LFG). We start by summarizing the original DOP model for tree representations and then show how it can be extended with corresponding functional structures. The resulting LFG-DOP model triggers a new, corpus-based notion of grammaticality, and its probability models exhibit interesting behavior with respect to specificity and the interpretation of ill-formed strings.
meeting of the association for computational linguistics | 1997
Remko Bonnema; Rens Bod; Remko Scha
In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Tree-bank. If a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. A data-oriented semantic interpretation algorithm was tested on two semantically annotated corpora: the English ATIS corpus and the Dutch OVIS corpus. Experiments show an increase in semantic accuracy if larger corpus-fragments are taken into consideration.
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES , 279 (1747) pp. 4522-4531. (2012) | 2012
Stefan L. Frank; Rens Bod; Morten H. Christiansen
It is generally assumed that hierarchical phrase structure plays a central role in human language. However, considerations of simplicity and evolutionary continuity suggest that hierarchical structure should not be invoked too hastily. Indeed, recent neurophysiological, behavioural and computational studies show that sequential sentence structure has considerable explanatory power and that hierarchical processing is often not involved. In this paper, we review evidence from the recent literature supporting the hypothesis that sequential structure may be fundamental to the comprehension, production and acquisition of human language. Moreover, we provide a preliminary sketch outlining a non-hierarchical model of language use and discuss its implications and testable predictions. If linguistic phenomena can be explained by sequential rather than hierarchical structure, this will have considerable impact in a wide range of fields, such as linguistics, ethology, cognitive neuroscience, psychology and computer science.