Valia Kordoni
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Valia Kordoni.
Natural Language Engineering | 2011
Bonnie Webber; Markus Egg; Valia Kordoni
An increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.
Computational Linguistics | 2009
Timothy Baldwin; Valia Kordoni; Aline Villavicencio
Prepositions1—as well as prepositional phrases (PPs) and markers of various sorts— have a mixed history in computational linguistics (CL), as well as related fields such as artificial intelligence, information retrieval (IR), and computational psycholinguistics: On the one hand they have been championed as being vital to precise language understanding (e.g., in information extraction), and on the other they have been ignored on the grounds of being syntactically promiscuous and semantically vacuous, and relegated to the ignominious rank of “stop word” (e.g., in text classification and IR). Although NLP in general has benefitted from advances in those areas where prepositions have received attention, there are still many issues to be addressed. For example, in machine translation, generating a preposition (or “case marker” in languages such as Japanese) incorrectly in the target language can lead to critical semantic divergences over the source language string. Equivalently in information retrieval and information extraction, it would seem desirable to be able to predict that book on NLP and book about NLPmean largely the same thing, but paranoid about drugs and paranoid on drugs suggest very different things. Prepositions are often among the most frequent words in a language. For example, based on the British National Corpus (BNC; Burnard 2000), four out of the top-ten most-frequent words in English are prepositions (of, to, in, and for). In terms of both parsing and generation, therefore, accurate models of preposition usage are essential to avoid repeatedly making errors. Despite their frequency, however, they are notoriously difficult to master, even for humans (Chodorow, Tetreault, and Han 2007). For example, Lindstromberg (2001) estimates that less than 10% of upper-level English as a Second
Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties | 2006
Yi Zhang; Valia Kordoni; Aline Villavicencio; Marco Idiart
However large a hand-crafted wide-coverage grammar is, there are always going to be words and constructions that are not included in it and are going to cause parse failure. Due to their heterogeneous and flexible nature, Multiword Expressions (MWEs) provide an endless source of parse failures. As the number of such expressions in a speakers lexicon is equiparable to the number of single word units (Jackendoff, 1997), one major challenge for robust natural language processing systems is to be able to deal with MWEs. In this paper we propose to semi-automatically detect MWE candidates in texts using some error mining techniques and validating them using a combination of the World Wide Web as a corpus and some statistical measures. For the remaining candidates possible lexico-syntactic types are predicted, and they are subsequently added to the grammar as new lexical entries. This approach provides a significant increase in the coverage of these expressions.
Archive | 2000
Erhard W. Hinrichs; Julia Bartels; Yasuhiro Kawata; Valia Kordoni; Heike Telljohann
The Tubingen treebanks for spoken German, English and Japanese provide linguistic annotations for the Verbmobil dialog corpus of spontaneous speech in the scenarios of appointment negotiations, travel arrangements and personal computer maintenance. The annotation schemes of the Tubingen treebanks have been developed taking into account the specific characteristics of spoken language dialogs: repetitions, hesitations, “false starts”, etc.
linguistic annotation workshop | 2009
Valia Kordoni; Yi Zhang
This paper presents an on-going effort which aims to annotate the Wall Street Journal sections of the Penn Treebank with the help of a hand-written large-scale and wide-coverage grammar of English. In doing so, we are not only focusing on the various stages of the semi-automated annotation process we have adopted, but we are also showing that rich linguistic annotations, which can apart from syntax also incorporate semantics, ensure that the treebank is guaranteed to be a truly sharable, re-usable and multi-functional linguistic resource.
meeting of the association for computational linguistics | 2007
Yi Zhang; Valia Kordoni; Erin Fitzgerald
This paper presents an approach to partial parse selection for robust deep processing. The work is based on a bottom-up chart parser for HPSG parsing. Following the definition of partial parses in (Kasper et al., 1999), different partial parse selection methods are presented and evaluated on the basis of multiple metrics, from both the syntactic and semantic viewpoints. The application of the partial parsing in spontaneous speech texts processing shows promising competence of the method.
international joint conference on natural language processing | 2004
Valia Kordoni; Julia Neu
We present a deep computational Modern Greek grammar. The grammar is written in HPSG and is being developed in a multilingual context with MRS semantics, contributing to an open-source collection of software and linguistic resources with wide usage in research, education, and application building.
Archive | 2000
Erhard W. Hinrichs; Sandra Kübler; Valia Kordoni; Frank Henrik Müller
Chunk parsing (see Abney, 1991, and Abney, 1996) offers a particularly promising approach for robust, partial parsing with the goal of broad data coverage. A chunk parser is particularly well suited for an application for spontaneous speech since it can deal robustly with fragmentary or ill-formed input.
ACM Transactions on Speech and Language Processing | 2013
Carlos Ramisch; Aline Villavicencio; Valia Kordoni
We are in 2013, and multiword expressions have been around for a while in the computational linguistics research community. Since the first ACL workshop on MWEs 12 years ago in Sapporo, Japan, much has been discussed, proposed, experimented, evaluated and argued about MWEs. And yet, they deserve the publication of a whole special issue of the ACM Transactions on Speech and Language Processing. But what is it about multiword expressions that keeps them in fashion? Who are the people and the institutions who perform and publish groundbreaking fundamental and applied research in this field? What is the place and the relevance of our lively research community in the bigger picture of computational linguistics? Where do we come from as a community, and most importantly, where are we heading? In this introductory article, we share our point of view about the answers to these questions and introduce the articles that compose the current special issue.
empirical methods in natural language processing | 2014
Kostadin Cholakov; Valia Kordoni
This article describes a linguistically informed method for integrating phrasal verbs into statistical machine translation (SMT) systems. In a case study involving English to Bulgarian SMT, we show that our method does not only improve translation quality but also outperforms similar methods previously applied to the same task. We attribute this to the fact that, in contrast to previous work on the subject, we employ detailed linguistic information. We found out that features which describe phrasal verbs as idiomatic or compositional contribute most to the better translation quality achieved by our method.