Erik F. Tjong Kim Sang

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik F. Tjong Kim Sang is active.

Explore More

Publication

Featured researches published by Erik F. Tjong Kim Sang.

north american chapter of the association for computational linguistics | 2002

Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

Erik F. Tjong Kim Sang; Fien De Meulder

We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.

conference on computational natural language learning | 2000

Introduction to the CoNLL-2000 shared task: chunking

Erik F. Tjong Kim Sang; Sabine Buchholz

We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.

conference of the european chapter of the association for computational linguistics | 1999

Representing text chunks

Erik F. Tjong Kim Sang; Jorn Veenstra

Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a convenient data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the the data representation choice has a minor influence on chunking performance. However, equipped with the most suitable data representation, our memory-based learning chunker was able to improve the best published chunking results for a standard data set.

conference on computational natural language learning | 2000

Text chunking by system combination

Erik F. Tjong Kim Sang

We will apply a system-internal combination of memory-based learning classifiers to the CoNLL-2000 shared task: finding base chunks. Apart from testing different combination methods, we will also examine if dividing the chunking process in a boundary recognition phase and a type identification phase would aid performance.

international conference on computational linguistics | 2000

Applying system combination to base noun phrase identification

Erik F. Tjong Kim Sang; Walter Daelemans; Hervé Déjean; Rob Koeling; Yuval Krymolowski; Vasin Punyakanok; Dan Roth

We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this data set.

international conference on computational linguistics | 2002

Memory-based named entity recognition

Erik F. Tjong Kim Sang

We apply a memory-based learner to the CoNLL-2002 shared task: language-independent named entity recognition. We use three additional techniques for improving the base performance of the learner: cascading, feature selection and system combination. The overall system is trained with two types of features: words and substrings of words which are relevant for this particular task. It is tested on the two language pairs that were available for this shared task: Spanish and Dutch.

conference on computational natural language learning | 2001

Learning computational grammars

John Nerbonne; Anja Belz; Nicola Cancedda; Hervé Déjean; James Hammerton; Rob Koeling; Stasinos Konstantopoulos; Miles Osborne; Franck Thollard; Erik F. Tjong Kim Sang

This paper reports on the LEARNING COMPUTATIONAL GRAMMARS (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the data, and the availability of knowledge bases (grammars). We focused on syntax, esp. noun phrase (NP) syntax.

Lecture Notes in Computer Science | 2001

Learning the logic of simple phonotactics

Erik F. Tjong Kim Sang; John Nerbonne

We report on experiments which demonstrate that by abductive inference it is possible to learn enough simple phonotactics to distinguish words from non-words for a simplified set of Dutch, the monosyllables. The monosyllables are distinguished in input so that segmentation is not problematic. Frequency information is withheld as is negative data. The methods are all tested using ten-fold cross-validation as well as a fixed number of randomly generated strings. Orthographic and phonetic representations are compared. The work presented in this chapter is part of a larger project comparing different machine learning techniques on linguistic data.

conference on computational natural language learning | 2001

Combining a self-organising map with memory-based learning

James Hammerton; Erik F. Tjong Kim Sang

Memory-based learning (MBL) has enjoyed considerable success in corpus-based natural language processing (NLP) tasks and is thus a reliable method of getting a high-level of performance when building corpus-based NLP systems. However there is a bottleneck in MBL whereby any novel testing item has to be compared against all the training items in memory base. For this reason there has been some interest in various forms of memory editing whereby some method of selecting a subset of the memory base is employed to reduce the number of comparisons. This paper investigates the use of a modified self-organising map (SOM) to select a subset of the memory items for comparison. This method involves reducing the number of comparisons to a value proportional to the square root of the number of training items. The method is tested on the identification of base noun-phrases in the Wall Street Journal corpus, using sections 15 to 18 for training and section 20 for testing.

Journal of Machine Learning Research | 2002