Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nelleke Oostdijk is active.

Publication


Featured researches published by Nelleke Oostdijk.


Peters, P.; Collins, P.; Smith, A. (ed.), New frontiers of corpus research. Papers from the twenty first international conference on English language research on computerized corpora, Sydney 2000 | 2002

The design of the Spoken Dutch Corpus

Nelleke Oostdijk

In June 1998 the Spoken Dutch Corpus project was started, a five-year project aimed at the compilation and annotation of a 10-million-word corpus of contemporary standard Dutch as spoken in the Netherlands and Flanders. This paper describes the corpus as it is currently under construction. It discusses more specifically the various considerations that have guided its design.


Speech Communication | 2005

On temporal aspects of turn taking in conversational dialogues

Louis ten Bosch; Nelleke Oostdijk; Lou Boves

Abstract In this short communication we show how shallow annotations in large speech corpora can be used to derive data about the temporal aspects of turn taking. Within the limitations of such a speech corpus, we show that the average durations of between-turn pauses made by speakers in a dyad are statistically related, and our data suggest the existence of gender effects in the temporal aspects of turn taking. Also, clear differences in turn taking behaviour between face-to-face and telephone dialogues can be detected using shallow analyses. We discuss the most important limitations imposed by the shallowness of the annotations in large corpora, and the possibility for enriching those annotations in a semi-automatic iterative manner.


text speech and dialogue | 2004

Durational Aspects of Turn-Taking in Spontaneous Face-to-Face and Telephone Dialogues

Louis ten Bosch; Nelleke Oostdijk; Jan de Ruiter

On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and face-to-face dialogues have different statistical properties. Pauses in a face-to-face dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.


international acm sigir conference on research and development in information retrieval | 2007

Evaluating discourse-based answer extraction for why -question answering

Suzan Verberne; Lou Boves; Nelleke Oostdijk; P.A.J.M. Coppen

30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007)


Spyns, P.;Odijk, J. (ed.), Essential Speech and Language Technology for Dutch | 2013

The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch

Nelleke Oostdijk; Martin Reynaert; Veronique Hoste; Ineke Schuurman

The construction of a large and richly annotated corpus of written Dutch was identified as one of the priorities of the STEVIN programme. Such a corpus, sampling texts from conventional and new media, is invaluable for scientific research and application development. The present chapter describes how in two consecutive STEVIN-funded projects, viz. D-Coi and SoNaR, the Dutch reference corpus was developed. The construction of the corpus has been guided by (inter)national standards and best practices. At the same time through the achievements and the experiences gained in the D-Coi and SoNaR projects, a contribution was made to their further advancement and dissemination.


Computational Linguistics | 2010

What is not in the bag of words for why-qa?

Suzan Verberne; Lou Boves; Nelleke Oostdijk; P.A.J.M. Coppen

While developing an approach to why-QA, we extended a passage retrieval system that uses off-the-shelf retrieval technology with a re-ranking step incorporating structural information. We get significantly higher scores in terms of MRR150 (from 0.25 to 0.34) and success10. The 23 improvement that we reach in terms of MRR is comparable to the improvement reached on different QA tasks by other researchers in the field, although our re-ranking approach is based on relatively lightweight overlap measures incorporating syntactic constituents, cue words, and document structure.


Journal of Germanic Linguistics | 2004

Finite Comment Clauses in Dutch: A Corpus-based approach

Carla Schelfhout; Peter-Arno Coppen; Nelleke Oostdijk

The present paper presents the results of a corpus-based study into the form and distribution of finite comment clauses in Dutch. More specifically, it was investigated where in the sentence such clauses can occur. For the analysis of the data, a topological descriptive model was used. While in the literature an extraction analysis has been suggested in order to account for finite comment clauses in English and German, our findings lead us to challenge this type of analysis and argue that a parenthetical analysis is to be preferred. 1 Thanks are due to Antal van den Bosch and Hans van Halteren for their help in tagging the corpus and to Toni Rietveld for his statistical advice.


international conference on computational linguistics | 2013

N-Gram-Based recognition of threatening tweets

Nelleke Oostdijk; Hans van Halteren

In this paper, we investigate to what degree it is possible to recognize threats in Dutch tweets. We attempt threat recognition on the basis of only the single tweet (without further context) and using only very simple recognition features, namely n-grams. We present two different methods of n-gram-based recognition, one based on manually constructed n-gram patterns and the other on machine learned patterns. Our evaluation is not restricted to precision and recall scores, but also looks into the difference in yield of the two methods, considering either combination or means that may help refine both methods individually.


advances in social networks analysis and mining | 2013

Shallow parsing for recognizing threats in Dutch tweets

Nelleke Oostdijk; Hans van Halteren

In this paper, we investigate the recognition of threats in Dutch tweets. As tweets often display irregular grammatical form and deviant orthography, analysis by standard means is problematic. Therefore, we have implemented a new shallow parsing mechanism which is driven by handcrafted rules. Experimental results are encouraging, with an F-measure of about 40% on a random sample of Dutch tweets. Moreover, the error analysis shows some clear avenues for further improvement.


patent information retrieval | 2010

Genre and domain in patent texts

Nelleke Oostdijk; Eva D'hondt; Hans van Halteren; Suzan Verberne

In this paper we investigate the variation in language use within the very broad patent domain. We find that language use (represented by syntactic phrases) not only differs from one patent class to the next, but is also a characteristic that sets apart the four sections of a patent (viz. Title, Abstract, Description and Claims). This lends support to the claim that these sections can be viewed as different text genres. For the development of a syntactic parser that is trained on patent texts, we quantify the domain and genre differences in terms of the amounts of text needed to train domain-dependent versions of the parser. Our quantified and exemplified findings on the domain variation in patent data are of interest for the patent retrieval and analysis communities.

Collaboration


Dive into the Nelleke Oostdijk's collaboration.

Top Co-Authors

Avatar

Lou Boves

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Suzan Verberne

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

H. van Halteren

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

P.A.J.M. Coppen

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

A. Hürriyetoğlu

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carla Schelfhout

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

D.L. Theijssen

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Hans van Halteren

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Peter-Arno Coppen

Radboud University Nijmegen

View shared research outputs
Researchain Logo
Decentralizing Knowledge