Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jan Strunk is active.

Publication


Featured researches published by Jan Strunk.


Computational Linguistics | 2006

Unsupervised Multilingual Sentence Boundary Detection

Tibor Kiss; Jan Strunk

In this article, we present a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. Instead of relying on orthographic clues, the proposed system is able to detect abbreviations with high accuracy using three criteria that only require information about the candidate type itself and are independent of context: Abbreviations can be defined as a very tight collocation consisting of a truncated word and a final period, abbreviations are usually short, and abbreviations sometimes contain internal periods. We also show the potential of collocational evidence for two other important subtasks of sentence boundary disambiguation, namely, the detection of initials and ordinal numbers. The proposed system has been tested extensively on eleven different languages and on different text genres. It achieves good results without any further amendments or language-specific resources. We evaluate its performance against three different baselines and compare it to other systems for sentence boundary detection proposed in the literature.


international conference on computational linguistics | 2002

Scaled log likelihood ratios for the detection of abbreviations in text corpora

Tibor Kiss; Jan Strunk

We describe a language-independent, flexible, and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as a collocation, and can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a good recall, its precision is poor. We employ scaling factors which lead to a strong improvement of precision. Experiments with English and German corpora show that abbreviations can be detected with high accuracy.


international conference on computational linguistics | 2006

A comparative evaluation of a new unsupervised sentence boundary detection approach on documents in english and portuguese

Jan Strunk; Carlos Nascimento Silla; Celso A. A. Kaestner

In this paper, we describe a new unsupervised sentence boundary detection system and present a comparative study evaluating its performance against different systems found in the literature that have been used to perform the task of automatic text segmentation into sentences for English and Portuguese documents. The results achieved by this new approach were as good as those of the previous systems, especially considering that the method does not require any additional training resources.


Archive | 2000

Viewing sentence boundary detection as collocation identification

Tibor Kiss; Jan Strunk


international conference on computational linguistics | 2010

A Logistic Regression Model of Determiner Omission in PPs

Tibor Kiss; Katja Kesselmeier; Antje Müller; Claudia Roch; Tobias Stadtfeld; Jan Strunk


language resources and evaluation | 2014

Untrained Forced Alignment of Transcriptions and Audio for Language Documentation Corpora using WebMAUS

Jan Strunk; Florian Schiel; Frank Seifart


linguistic annotation workshop | 2010

An Annotation Schema for Preposition Senses in German

Antje Müller; Olaf Hülscher; Claudia Roch; Katja Kesselmeier; Tobias Stadtfeld; Jan Strunk; Tibor Kiss


Archive | 2005

The role of animacy in the nominal possessive constructions of Modern Low Saxon

Jan Strunk


Archive | 2004

Possessive Constructions in Modern Low Saxon

Jan Strunk


Archive | 2013

Subclausal locality constraints on relative clause extraposition

Jan Strunk; Neal Snider

Collaboration


Dive into the Jan Strunk's collaboration.

Top Co-Authors

Avatar

Tibor Kiss

Ruhr University Bochum

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Celso A. A. Kaestner

Pontifícia Universidade Católica do Paraná

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge