Annie Zaenen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Annie Zaenen is active.

Explore More

Publication

Featured researches published by Annie Zaenen.

international conference on computational linguistics | 1992

Two-level morphology with composition

Lauri Karttunen; Ronald M. Kaplan; Annie Zaenen

Two-Level Morphology with Composition Lauri Karttunen, Ronald M. Kaplan, and Annie Zaenen Xerox Palo Alto Research Center Center for the Study of language and Information StanJbrd University 1. Limitations of Kimmo systems The advent of two-level morphology (Koskenniemi [1], Karttunen [2], Antworth [3], Ritchie et al. [4]) has made it relatively easy to develop adequate morphological (or at least morphographical) descriptions for natural languages, clearly superior to earlier cut-and-paste approaches to mor- phology. Most of the existing Kimmo systems developed within this paradigm consist of • linked lexicons stored as annotated letter trees • morphological information on the leaf nodes of trees • transducers that encode morphological alternations An analysis of an inflected word form is produced by mapping the input form to a sequence of lexical forms through the transducers and by composing some out- put from the annotations on the leaf nodes of the lexical paths that were traversed. Comprehensive morphological descrip- tions of this type have been developed for several languages including Finnish, Swedish, Russian, English, Swahili, and Arabic. Although they have several good features, these Kimmo-systems also have some limitations. The ones we want to ad- dress in this paper are the following: (1) Lexical representations tend to be arbitrary. Because it is difficult to write and test two-level systems that map between pairs of radically dissimilar forms, lexical representations in existing two-level analyzers tend to stay close to the surface forms. This is not a problem for morpho- logically simple languages like English because, for most words, inflected forms are very similar to the canonical dictionary entry. Except for a small number of irregular verbs and nouns, it is not difficult to create a two-level description for English in which lexical forms coincide with the canonical citation forms found in a dictionary. However, current analyzers for mor- phologically more complex languages (Finnish and Russian, for example) are not as satisfying in this respect. In these systems, lexical forms typically contain diacritic markers and special symbols; they are not real words in the language. For example, in Finnish the lexical counterpart of otin I took might be rendered as otTallln, where T, al, and I1 are an arbitrary encoding of morpho- logical alternations that determine the allomorphs of the stem and the past tense morpheme. The canonical citation form ottaa to take is composed from annotations on the leaf nodes of the letter trees that are linked to match the input. It is not in any direct way related to the lexical form produced by the transducers. (2) Morphological categories are not directly encoded as part of the lexical form. Instead of morphemes like Plural or Past, we typically see suffix strings like +s, and +ed, which do not by themselves indi- cate what morpheme they express. Different realizations of the same morpho- logical category are often represented as different even on the lexical side. These characteristics lead to some un- desirable consequences: ACRES DE COLING-92, NANTES, 23-28 AO~ 1992 1 4 1 PROC. OF COLING-92, NA~rr~s, AU6.23-28, 1992

meeting of the association for computational linguistics | 2004

Animacy encoding in English: why and how

Annie Zaenen; Jean Carletta; Gregory Garretson; Joan Bresnan; Andrew Koontz-Garboden; Tatiana Nikitina; M. Catherine O'Connor; Tom Wasow

We report on two recent medium-scale initiatives annotating present day English corpora for animacy distinctions. We discuss the relevance of animacy for computational linguistics, specifically generation, the annotation categories used in the two studies and the interannotator reliability for one of the studies.

international conference on computational linguistics | 1990

Modeling syntactic constraints on anaphoric binding

Mary Dalrymple; John T. Maxwell; Annie Zaenen

Syntactic constraints on antecedent-anaphor relations can be stated within the theory of Lexical Functional Grammar (henceforth LFG) through the use of functional uncertainty (Kaplan and Maxwell 1988; Halvorsen and Kaplan 1988; Kaplan and Zaenen 1989). In the following, we summarize the general characteristics of syntactic constraints on anaphoric binding. Next, we describe a variation of functional uncertainty called inside-out functional uncertainty and show how it can be used to model anaphoric binding. Finally, we discuss some binding constraints claimed to hold in natural language to exemplify the mechanism. We limit our attention throughout to coreference possibilities between definite antecedents and anaphoric elements and ignore interactions with quantifiers. We also limit our discussion to intrasentential relations.

Handbook of Linguistic Annotation | 2017

Designing Annotation Schemes: From Theory to Model

James Pustejovsky; Harry Bunt; Annie Zaenen

In this chapter, we describe the method and process of transforming the theoretical formulations of a linguistic phenomenon, based on empirical observations, into a model that can be used for the development of a language annotation specification. We outline this procedure generally, and then examine the steps in detail by specific example. We look at how this methodology has been implemented in the creation of TimeML (and ISO-TimeML), a broad-based standard for annotating temporal information in natural language texts. Because of the scope of this effort and the richness of the theoretical work in the area, the development of TimeML illustrates very clearly the methodology of the early stages of the MATTER annotation cycle, where initial models and schemas cycle through progressively mature versions of the resulting specification. Furthermore, the subsequent effort to convert TimeML into an ISO compliant standard, ISO-TimeML, demonstrates the utility of the CASCADES model in distinguishing between the concrete syntax of the schema and abstract syntax of the model behind it.

Third International Conference on Discourse Anaphora and Anaphor Resolution (DAARC2000) | 1999