Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amir Zeldes is active.

Publication


Featured researches published by Amir Zeldes.


Literary and Linguistic Computing | 2016

ANNIS3: A new architecture for generic corpus query and visualization

Thomas Krause; Amir Zeldes

This article is concerned with the data structures, properties of query languages, and visualization facilities required for the generic representation of richly annotated, heterogeneous linguistic corpora. We propose that above and beyond a general graph-based data model, which is becoming increasingly popular in many complex annotation formats, a well-defined concept of multiple, potentially conflicting segmentation layers must be introduced to deal with different sources and applications of corpus data flexibly. We also propose a generic solution for specialized corpus visualizations in a Web interface using annotation-triggered style sheets, which leverage the power of modern browsers and CSS for multiple and highly customizable views of primary data. We offer an implementation and evaluation of our architecture in ANNIS3, an open-source browser-based architecture for corpus search and visualization. We present three case studies to test the coverage of the system, encompassing core linguistic and digital humanities use-cases including richly annotated newspaper treebanks, multilingual diplomatic and normalized manuscript materials edited in TEI, and analysis of multimodal recordings of spoken language.


Archive | 2012

Productivity in argument selection : from morphology to syntax

Amir Zeldes

This book centers on the idea that some verbs and other argument structure constructions have an inherently different propensity to realize lexically unfamiliar arguments, independently of lexical semantic meaning. This notion is explored both qualitatively using selected examples, and quantitatively using large amounts of corpus data, in both cases primarily from English and German.


language resources and evaluation | 2017

The GUM corpus: creating multilayer resources in the classroom

Amir Zeldes

This paper presents the methodology, design principles and detailed evaluation of a new freely available multilayer corpus, collected and edited via classroom annotation using collaborative software. After briefly discussing corpus design for open, extensible corpora, five classroom annotation projects are presented, covering structural markup in TEI XML, multiple part of speech tagging, constituent and dependency parsing, information structural and coreference annotation, and Rhetorical Structure Theory analysis. Layers are inspected for annotation quality and together they coalesce to form a richly annotated corpus that can be used to study the interactions between different levels of linguistic description. The evaluation gives an indication of the expected quality of a corpus created by students with relatively little training. A multifactorial example study on lexical NP coreference likelihood is also presented, which illustrates some applications of the corpus. The results of this project show that high quality, richly annotated resources can be created effectively as part of a linguistics curriculum, opening new possibilities not just for research, but also for corpora in linguistics pedagogy.


Digital Scholarship in the Humanities | 2015

Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities

Amir Zeldes; Caroline T. Schroeder

This article motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evaluate our tag set in an inter-annotator agreement experiment and examine some of the difficulties in tagging Coptic data. Using an existing digital lexicon and a small training corpus taken from several genres of literary Sahidic Coptic in the first half of the first millennium, we evaluate the performance of a stochastic tagger applying a fine-grained and coarse-grained set of tags within and outside the domain of literary texts. Our results show that a relatively high accuracy of 94–95% correct automatic tag assignment can be reached for literary texts, with substantially worse performance on documentary papyrus data. We also present some preliminary applications of natural language processing to the study of genre, style, and authorship attribution in Coptic and discuss future directions in applying computational linguistics methods to the analysis of Coptic texts.


Archive | 2012

Deutsche Komposita zwischen Syntax und Morphologie: Ein korpusbasierter Ansatz

Livio Gaeta; Amir Zeldes

Die Komposition ist ein hochproduktiver Wortbildungsprozess des Deutschen, zumindest im Bereich der nominalen Komposition. Weit weniger deutlich ist jedoch, wo die Grenzen zu anderen Wortbildungsmustern zu ziehen sind. Unklar ist auserdem, welche Auswirkungen diese Kompositionsfreudigkeit fur das gesamte (etwa prosodisch-phonologische, morphologische, syntaktische, informationsstrukturelle) Sprachsystem hat. Dabei ist beispielsweise an die Entstehung von Affixoiden oder an die Konkurrenz zwischen Nominalkomposita und formal korrespondierenden, bedeutungsgleichen Nominalphrasen (Gruntee / gruner Tee) zu denken. Thema der Arbeitsgruppe sind daher sowohl Probleme der strukturellen Analyse und Abgrenzung von Komposita als auch die Komposition als typologisches Merkmal des Deutschen. Im Mittelpunkt stehen daher u.a. Fragen zu den folgenden Aspekten:


language resources and evaluation | 2015

serialising the ISO SynAF syntactic object model

Laurent Romary; Amir Zeldes; Florian Zipser

Abstract This paper introduces , an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types such as compounds or empty elements. We also define interfaces to other formats and standards including the Morpho-syntactic Annotation Framework MAF and the ISOCat Data Category Registry. Finally a case study of the German Treebank TueBa-D/Z is presented, showcasing the handling of constituent structures, topological fields and coreference annotation in tandem.


Linguistic Typology | 2013

Is Modern Hebrew Standard Average European? The view from European

Amir Zeldes

Abstract Unlike previous work emphasizing European influences on Modern Hebrew as compared to the Biblical language advocated by the Hebrew revival movement, this article sets out to examine typological features of Modern Hebrew in its own right. Starting from literature on Standard Average European, I argue that Modern Hebrew differs from the European type in most features defined independently of literature on Hebrew. Notwithstanding many European influences, especially in phonology and semantics, and considerable differences to Biblical Hebrew, I will show that key structural similarities with European languages are remarkably few, and in most cases not due to the revival process.


Proceedings of the Workshop on Computational Approaches to Linguistic Creativity | 2009

Quantifying Constructional Productivity with Unseen Slot Members

Amir Zeldes

This paper is concerned with the possibility of quantifying and comparing the productivity of similar yet distinct syntactic constructions, predicting the likelihood of encountering unseen lexemes in their unfilled slots. Two examples are explored: variants of comparative correlative constructions (CCs, e.g. the faster the better), which are potentially very productive but in practice lexically restricted; and ambiguously attached prepositional phrases with the preposition with, which can host both large and restricted inventories of arguments under different conditions. It will be shown that different slots in different constructions are not equally likely to be occupied productively by unseen lexemes, and suggested that in some cases this can help disambiguate the underlying syntactic and semantic structure.


north american chapter of the association for computational linguistics | 2016

rstWeb - A Browser-based Annotation Interface for Rhetorical Structure Theory and Discourse Relations

Amir Zeldes

This paper presents rstWeb, a new browserbased interface for Rhetorical Structure Theory and other discourse relation annotations. Expanding on previous tools for RST, rstWeb allows annotators to work online using only a browser. Project administrators can easily collect multiple annotations of the same documents on a central server, keep track of annotation processes and assign tasks and annotation schemes to users. A local version using an embedded web framework is also available, running offline on a desktop browser under the localhost.


north american chapter of the association for computational linguistics | 2016

When Annotation Schemes Change Rules Help: A Configurable Approach to Coreference Resolution beyond OntoNotes.

Amir Zeldes; Shuo Zhang

This paper approaches the challenge of adapting coreference resolution to different coreference phenomena and mention-border definitions when there is no access to large training data in the desired target scheme. We take a configurable, rule-based approach centered on dependency syntax input, which we test by examining coreference types not covered in benchmark corpora such as OntoNotes. These include cataphora, compound modifier coreference, generic anaphors, predicate markables, i-within-i, and metonymy. We test our system, called xrenner, using different configurations on two very different datasets: Wall Street Journal material from OntoNotes and four types Wiki data from the GUM corpus. Our system compares favorably with two leading rule based and stochastic approaches in handling the different annotation formats.

Collaboration


Dive into the Amir Zeldes's collaboration.

Top Co-Authors

Avatar

Florian Zipser

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Anke Lüdeling

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Thomas Krause

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Hagen Hirschmann

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Christian Chiarcos

Goethe University Frankfurt

View shared research outputs
Top Co-Authors

Avatar

Marc Reznicek

Complutense University of Madrid

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andreas Haida

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge