Gosse Bouma | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gosse Bouma is active.

Explore More

Publication

Featured researches published by Gosse Bouma.

Natural Language and Linguistic Theory | 2001

Satisfying constraints on extraction and adjunction

Gosse Bouma; Robert Malouf; Ivan A. Sag

In this paper, we present a unified feature-based theory of complement, adjunct, and subject extraction, in which there is no need either for valence reducing lexical rules or for phonologically null traces. Our analysis rests on the assumption that the mapping between argument structure and valence is defined by realization constraints which are satisfied by all lexical heads. Arguments can be realized as local dependents, in which case they are selected via the heads valence features. Alternatively, arguments may be realized in a long-distance dependency construction, in which case they are selected via the heads slash features. Furthermore, we argue that English post-verbal adjuncts, as well as complements, are syntactic dependentsselected by the verb, thus providing a uniform analysis of complement andadjunct extraction. Finally, we show that our analysis provides analternative treatment of subject extraction and we offer a new account of thethat-trace effect.

computational linguistics in the netherlands | 2002

The Alpino Dependency Treebank

Leonoor van der Beek; Gosse Bouma; Robert Malouf; Gertjan van Noord

In this paper we present the Alpino Dependency Treebank and the tools that we have developed to facilitate the annotation process. Annotation typically starts with parsing a sentence with the Alpino parser, a wide coverage parser of Dutch text, The number of parses that is generated is reduced through interactive lexical analysis and constituent marking. A tool for on line addition of lexical information facilitates the parsing of sentences with unknown words. The selection of the best parse is done efficiently with the parse selection tool. At this moment, the Alpino Dependency Treebank consists of about 6,000 sentences of newspaper text that are annotated with dependency trees. The corpus can be used for linguistic exploration as well as for training and evaluation purposes.

computational linguistics in the netherlands | 2001

Alpino: Wide-coverage Computational Analysis of Dutch

Gosse Bouma; Gertjan van Noord; Robert Malouf

Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect of wide-coverage parsing is robustness and disambiguation. The dependency relations encoded in the dependency structures have been used to develop and evaluate both hand-coded and statistical disambiguation methods.

Natural Language Engineering | 1999

Robust grammatical analysis for spoken dialogue systems

Gertjan van Noord; Gosse Bouma; Rob Koeling; Mark-Jan Nederhof

We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.

cross language evaluation forum | 2005

Question answering for dutch using dependency relations

Gosse Bouma; Jori Mur; Gertjan van Noord; Lonneke van der Plas; Jörg Tiedemann

Joost is a question answering system for Dutch which makes extensive use of dependency relations. It answers questions either by table look-up, or by searching for answers in paragraphs returned by IR. Syntactic similarity is used to identify and rank potential answers. Tables were constructed by mining the CLEF corpus, which has been syntactically analyzed in full.

web information systems engineering | 2007

Mapping metadata for SWHi: aligning schemas with library metadata for a historical ontology

J. Zhang; I. Fahmi; Henk Ellermann; Gosse Bouma

What are the possibilities of Semantic Web technologies for organizations which traditionally have lots of structured data, such as metadata, available? A library is such a particular organization. We mapped a digital librarys descriptive (bibliographic) metadata for a large historical document collection encoded in MARC21 to a historical ontology using an out-of-the-box ontology, existing topic hierarchies on the World Wide Web and other resources. We also created and explored useful relations for such an ontology. We show that mapping the metadata to an ontology adds information and makes the existing information more easily accessible for users. The paper discusses various issues that arose during the mapping process. The result of mapping metadata to RDF/OWL is a populated ontology, ready to be deployed.

Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3) | 2009

Cross-lingual Alignment and Completion of Wikipedia Templates

Gosse Bouma; S. Duarte; Zahurul Islam

For many languages, the size of Wikipedia is an order of magnitude smaller than the English Wikipedia. We present a method for cross-lingual alignment of template and infobox attributes in Wikipedia. The alignment is used to add and complete templates and infoboxes in one language with information derived from Wikipedia in another language. We show that alignment between English and Dutch Wikipedia is accurate and that the result can be used to expand the number of template attribute-value pairs in Dutch Wikipedia by 50%. Furthermore, the alignment provides valuable information for normalization of template and attribute names and can be used to detect potential inconsistencies.

Essential Speech and Language Technology for Dutch | 2013

Large Scale Syntactic Annotation of Written Dutch: Lassy

Gertjan van Noord; Gosse Bouma; Frank Van Eynde; Daniël de Kok; Jelmer van der Linde; Ineke Schuurman; Erik F. Tjong Kim Sang; Vincent Vandeghinste

This chapter presents the Lassy Small and Lassy Large treebanks, as well as related tools and applications. Lassy Small is a corpus of written Dutch texts (1,000,000 words) which has been syntactically annotated with manual verification and correction. Lassy Large is a much larger corpus (over 500,000,000 words) which has been syntactically annotated fully automatically. In addition, various browse and search tools for syntactically annotated corpora have been developed and made available. Their potential for applications in corpus linguistics and information extraction has been illustrated and evaluated in a series of case studies.

meeting of the association for computational linguistics | 1990

DEFAULTS IN UNIFICATION GRAMLAR

Gosse Bouma

Incorporation of defaults in grammar formalisms is important for reasons of linguistic adequacy and grammar organization. In this paper we present an algorithm for handling default information in unification grammar. The algorithm specifies a logical operation on feature structures, merging with the non-default structure only those parts of the default feature structure which are not constrained by the non-default structure. We present various linguistic applications of default unification.

computational linguistics in the netherlands | 2002

Accurate stemming of Dutch for text classification

Tanja Gaustad; Gosse Bouma

This paper investigates the use of stemming for classification of Dutch (email) texts. We introduce a stemmer, which combines dictionary lookup (implemented efficiently as a finite state automaton) with a rule-based backup strategy and,how, that it outperforms the Dutch Porter stemmer in terms of accuracy. while not being substantially slower. For text classification, the most important property of a stemmer is the number of words it (correctly) reduces to the same stem. Here the dictionary - based system also outperforms Porter. However, evaluation of a Bayesian text classification system with either no stemming or the Porter or dictionary-based stemmer on an email classification and a newspaper topic classification task does not lead to significant differences in accuracy. We conclude with an analysis of why this is the case.

Explore More