Heike Zinsmeister | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Heike Zinsmeister is active.

Explore More

Publication

Featured researches published by Heike Zinsmeister.

language resources and evaluation | 2012

Annotating abstract anaphora

Stefanie Dipper; Heike Zinsmeister

In this paper, we present first results from annotating abstract (discourse-deictic) anaphora in German. Our annotation guidelines provide linguistic tests for identifying the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus. To date, 100 texts have been annotated according to these guidelines. The annotations show that anaphoric personal and demonstrative pronouns differ with respect to the distance to their antecedents. A semantic analysis reveals that, contrary to suggestions put forward in the literature, referents of anaphors do not tend to be more abstract than the referents of their antecedents.

linguistic annotation workshop | 2009

Annotating Discourse Anaphora

Stefanie Dipper; Heike Zinsmeister

In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.

sighum workshop on language technology for cultural heritage social sciences and humanities | 2016

Dealing with word-internal modification and spelling variation in data-driven lemmatization

Fabian Barteld; Ingrid Schröder; Heike Zinsmeister

This paper describes our contribution to two challenges in data-driven lemmatization. We approach lemmatization in the framework of a two-stage process, where first lemma candidates are generated and afterwards a ranker chooses the most probable lemma from these candidates. The first challenge is that languages with rich morphology like Modern German can feature morphological changes of different kinds, in particular word-internal modification. This makes the generation of the correct lemma a harder task than just removing suffixes (stemming). The second challenge that we address is spelling variation as it appears in non-standard texts. We experiment with different generators that are specifically tailored to deal with these two challenges. We show in an oracle setting that there is a possible increase in lemmatization accuracy of 14% with our methods to generate lemma candidates on Middle Low German, a group of historical dialects of German (1200‐1650 AD). Using a log-linear model to choose the correct lemma from the set, we obtain an actual increase of 5.56%.

discourse anaphora and anaphor resolution colloquium | 2011

Abstract anaphors in german and english

Stefanie Dipper; Christine Rieger; Melanie Seiss; Heike Zinsmeister

Abstract anaphors refer to abstract referents such as facts or events. Automatic resolution of this kind of anaphora still poses a problem for language processing systems. The present paper presents a corpus-based comparative study on German and English abstract anaphors and their antecedents to gain further insights into the linguistic properties of different anaphor types and their distributions. To this end, parallel texts from the Europarl corpus have been annotated with functional and morpho-syntactic information. We outline the annotation process and show how we start out with a small set of well-defined markables in German. We successively expand this set in a cross-linguistic bootstrapping approach by collecting translation equivalents from English and using them to track down further forms of German anaphors, and, in the next turn, in English, etc.

linguistic annotation workshop | 2014

Annotating descriptively incomplete language phenomena

Fabian Barteld; Sarah Ihden; Ingrid Schröder; Heike Zinsmeister

When annotating non-standard languages, descriptively incomplete language phenomena (EAGLES, 1996) are often encountered. In this paper, we present examples of ambiguous forms taken from a historical corpus and offer a classification of such descriptively incomplete language phenomena and its rationale. We then discuss various approaches to the annotation of these phenomena, arguing that multiple annotations provide the most appropriate encoding strategy for the annotator. Finally, we show how multiple annotations can be encoded in existing standards such as PAULA and GrAF.

Archive | 2012

Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z)

Heike Telljohann; Erhard W. Hinrichs; Sandra Kübler; Heike Zinsmeister; Kathrin Beck

Archive | 2003

New perspectives on case theory

Ellen Brandner; Heike Zinsmeister; Artemis Alexiadou; Miriam Butt; Tracy Holloway King; Eric Haeberli; Jóhannes Gísli Jónsson; Marcus Kracht; Diane Nelson; Halldor Armann Sigurðsson; Ralf Vogel; Ellen Woolford; Dieter Wunderlich

Archive | 2010