Jan Strunk
Ruhr University Bochum
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan Strunk.
Computational Linguistics | 2006
Tibor Kiss; Jan Strunk
In this article, we present a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. Instead of relying on orthographic clues, the proposed system is able to detect abbreviations with high accuracy using three criteria that only require information about the candidate type itself and are independent of context: Abbreviations can be defined as a very tight collocation consisting of a truncated word and a final period, abbreviations are usually short, and abbreviations sometimes contain internal periods. We also show the potential of collocational evidence for two other important subtasks of sentence boundary disambiguation, namely, the detection of initials and ordinal numbers. The proposed system has been tested extensively on eleven different languages and on different text genres. It achieves good results without any further amendments or language-specific resources. We evaluate its performance against three different baselines and compare it to other systems for sentence boundary detection proposed in the literature.
international conference on computational linguistics | 2002
Tibor Kiss; Jan Strunk
We describe a language-independent, flexible, and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as a collocation, and can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a good recall, its precision is poor. We employ scaling factors which lead to a strong improvement of precision. Experiments with English and German corpora show that abbreviations can be detected with high accuracy.
international conference on computational linguistics | 2006
Jan Strunk; Carlos Nascimento Silla; Celso A. A. Kaestner
In this paper, we describe a new unsupervised sentence boundary detection system and present a comparative study evaluating its performance against different systems found in the literature that have been used to perform the task of automatic text segmentation into sentences for English and Portuguese documents. The results achieved by this new approach were as good as those of the previous systems, especially considering that the method does not require any additional training resources.
Archive | 2000
Tibor Kiss; Jan Strunk
international conference on computational linguistics | 2010
Tibor Kiss; Katja Kesselmeier; Antje Müller; Claudia Roch; Tobias Stadtfeld; Jan Strunk
language resources and evaluation | 2014
Jan Strunk; Florian Schiel; Frank Seifart
linguistic annotation workshop | 2010
Antje Müller; Olaf Hülscher; Claudia Roch; Katja Kesselmeier; Tobias Stadtfeld; Jan Strunk; Tibor Kiss
Archive | 2005
Jan Strunk
Archive | 2004
Jan Strunk
Archive | 2013
Jan Strunk; Neal Snider