Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefanie Dipper is active.

Publication


Featured researches published by Stefanie Dipper.


linguistic annotation workshop | 2007

Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus

Kepa Joseba Rodríguez; Stefanie Dipper; Michael Götze; Massimo Poesio; Giuseppe Riccardi; Christian Raymond; Joanna Rabiega-Wiśniewska

The LUNA corpus is a multi-lingual, multi-domain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the annotation at each of these levels. In order to synchronize these multiple layers of annotation, the PAULA standoff exchange format will be used. In this paper, we present the corpus and its PAULA-based architecture.


language resources and evaluation | 2012

Annotating abstract anaphora

Stefanie Dipper; Heike Zinsmeister

In this paper, we present first results from annotating abstract (discourse-deictic) anaphora in German. Our annotation guidelines provide linguistic tests for identifying the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus. To date, 100 texts have been annotated according to these guidelines. The annotations show that anaphoric personal and demonstrative pronouns differ with respect to the distance to their antecedents. A semantic analysis reveals that, contrary to suggestions put forward in the literature, referents of anaphors do not tend to be more abstract than the referents of their antecedents.


language and technology conference | 2011

Applying Rule-Based Normalization to Different Types of Historical Texts—An Evaluation

Marcel Bollmann; Florian Petran; Stefanie Dipper

This paper deals with normalization of language data from Early New High German. We describe an unsupervised, rule-based approach which maps historical wordforms to modern wordforms. Rules are specified in the form of context-aware rewrite rules that apply to sequences of characters. They are derived from two aligned versions of the Luther bible and weighted according to their frequency. Applying the normalization rules to texts by Luther results in 91 % exact matches, clearly outperforming the baseline (65 %). Matches can be improved to 93 % by combining the approach with a word substitution list. If applied to more diverse language data from roughly the same period, performance goes down to 43 % exact matches (baseline: 35 %), and to 46 % using the combined method. The results show that rules derived from a highly different type of text can support normalization to a certain extent.


linguistic annotation workshop | 2009

Annotating Discourse Anaphora

Stefanie Dipper; Heike Zinsmeister

In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.


sighum workshop on language technology for cultural heritage social sciences and humanities | 2014

CorA: A web-based annotation tool for historical and other non-standard language data

Marcel Bollmann; Florian Petran; Stefanie Dipper; Julia Krasselt

We present CorA, a web-based annotation tool for manual annotation of historical and other non-standard language data. It allows for editing the primary data and modifying token boundaries during the annotation process. Further, it supports immediate retraining of taggers on newly annotated data.


discourse anaphora and anaphor resolution colloquium | 2011

Abstract anaphors in german and english

Stefanie Dipper; Christine Rieger; Melanie Seiss; Heike Zinsmeister

Abstract anaphors refer to abstract referents such as facts or events. Automatic resolution of this kind of anaphora still poses a problem for language processing systems. The present paper presents a corpus-based comparative study on German and English abstract anaphors and their antecedents to gain further insights into the linguistic properties of different anaphor types and their distributions. To this end, parallel texts from the Europarl corpus have been annotated with functional and morpho-syntactic information. We outline the annotation process and show how we start out with a small set of well-defined markables in German. We successively expand this set in a cross-linguistic bootstrapping approach by collecting translation equivalents from English and using them to track down further forms of German anaphors, and, in the next turn, in English, etc.


NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing | 2006

ANNIS: complex multilevel annotations in a linguistic database

Michael Götze; Stefanie Dipper

We present ANNIS, a linguistic database that aims at facilitating the process of exploiting richly annotated language data by naive users. We describe the role of the database in our research project and the project requirements, with a special focus on aspects of multilevel annotation. We then illustrate the usability of the database by illustrative examples. We also address current challenges and next steps.


language and technology conference | 2009

OTTO: a tool for diplomatic transcription of historical texts

Stefanie Dipper; Martin Schnurrenberger

In this paper, we present OTTO, a web-based transcription tool which is designed for diplomatic transcription of historical language data. The tool supports fast and accurate typing, by use of user-defined special characters, and, simultaneously, providing a view on the manuscript that is as close to the original as possible. It also allows for the annotation of rich, user-defined header information. Users can log in and operate OTTO from anywhere through a standard web browser.


meeting of the association for computational linguistics | 2016

Annotating Spelling Errors in German Texts Produced by Primary School Children.

Ronja Laarmann-Quante; Lukas Knichel; Stefanie Dipper; Carina Betken

We present a new multi-layered annotation scheme for orthographic errors in freely written German texts produced by primary school children. The scheme is closely linked to the German graphematic system and defines categories for both general structural word properties and errorrelated properties. Furthermore, it features multiple layers of information which can be used to evaluate an error. The categories can also be used to investigate properties of correctly-spelled words, and to compare them to the erroneous spellings. For data representation, we propose the XML-format LearnerXML.


meeting of the association for computational linguistics | 2016

Evaluating Inter-Annotator Agreement on Historical Spelling Normalization.

Marcel Bollmann; Stefanie Dipper; Florian Petran

This paper deals with means of evaluating inter-annotator agreement for a normalization task. This task differs from common annotation tasks in two important aspects: (i) the class of labels (the normalized wordforms) is open, and (ii) annotations can match to different degrees. We propose a new method to measure inter-annotator agreement for the normalization task. It integrates common chancecorrected agreement measures, such as Fleiss’s κ or Krippendorff’s α. The novelty of our proposed method lies in the way the annotated word forms are treated. First, they are evaluated character-wise; second, certain characters are mapped to more general categories.

Collaboration


Dive into the Stefanie Dipper's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jonas Kuhn

University of Stuttgart

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Manfred Stede

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Miriam Butt

University of Konstanz

View shared research outputs
Top Co-Authors

Avatar

Anke Lüdeling

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge