Eric Wehrli
University of Geneva
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eric Wehrli.
meeting of the association for computational linguistics | 2006
Violeta Seretan; Eric Wehrli
This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and multilingual parsing for detecting accurate collocational information from English, French, Spanish and Italian corpora. The advantage of relying on full parsing over using a traditional window method (which ignores the syntactic information) is first theoretically motivated, then empirically validated by a comparative evaluation experiment.
workshop on statistical machine translation | 2009
Eric Wehrli; Luka Nerima; Yves Scherrer
This paper describes the MulTra project, aiming at the development of an efficient multilingual translation technology based on an abstract and generic linguistic model as well as on object-oriented software design. In particular, we will address the issue of the rapid growth both of the transfer modules and of the bilingual databases. For the latter, we will show that a significant part of bilingual lexical databases can be derived automatically through transitivity, with corpus validation.
language resources and evaluation | 2009
Violeta Seretan; Eric Wehrli
An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobile-window method. The evaluation experiment investigates several levels of the significance lists, uses a fine-grained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP applications.
conference of the european chapter of the association for computational linguistics | 2003
Luka Nerima; Violeta Seretan; Eric Wehrli
This paper describes a system of terminological extraction capable of handling multi-word expressions, using a powerful syntactic parser. The system includes a concordancing tool enabling the user to display the context of the collocation, i.e. the sentence or the whole document where the collocation occurs. Since the corpora are multilingual, the system also offers an alignment mechanism for the corresponding translated documents.
Proceedings of the Workshop on Multilingual Language Resources and Interoperability | 2006
Violeta Seretan; Eric Wehrli
Although traditionally seen as a language-independent task, collocation extraction relies nowadays more and more on the linguistic preprocessing of texts (e.g., lemmatization, POS tagging, chunking or parsing) prior to the application of statistical measures. This paper provides a language-oriented review of the existing extraction work. It points out several language-specific issues related to extraction and proposes a strategy for coping with them. It then describes a hybrid extraction system based on a multilingual parser. Finally, it presents a case-study on the performance of an association measure across a number of languages.
Proceedings of the 10th Workshop on Multiword Expressions (MWE) | 2014
Eric Wehrli
Although multiword expressions (MWEs) have received an increasing amount of attention in the NLP community over the last two decades, few papers have been dedicated to the specific problem of the interaction between MWEs and parsing. In this paper, we will discuss how the collocation identification task has been integrated in our rulebased parser and show how collocation knowledge has a positive impact on the parsing process. A manual evaluation has been conducted over a corpus of 4000 sentences, comparing outputs of the parser used with and without the collocation component. Results of the evaluation clearly support our claim.
Archive | 2015
Eric Wehrli; Luka Nerima
This paper reports on the Fips parser, a multilingual constituent parser that has been developed over the last two decades. After a brief historical overview of the numerous modifications and adaptations made to this system over the years, we provide a description of its main characteristics. The linguistic framework that underlies the Fips system has been much influenced by Generative Grammar, but drastically simplified in order to make it easier to implement in an efficient manner. The parsing procedure is a one pass (no preprocessing, no postprocessing) scan of the input text, using rules to build up constituent structures and (syntactic) interpretation procedures to determine the dependency relations between constituents (grammatical functions, etc.), including cases of long-distance dependencies. The final section offers a description of the rich lexical database developed for Fips. The lexical model assumes two distinct levels for lexical units: words, which are inflected forms of lexical units, and lexemes, which are more abstract units, roughly corresponding to a particular reading of a word. Collocations are defined as an association of two lexical units (lexeme or collocation) in a specific grammatical relation such as adjective-noun or verb-object.
international conference on computational linguistics | 1990
Eric Wehrli
STS is a small experimental sentence translation system developed to demonstrate the efficiency of our lexicalist model of translation. Based on a GB-inspired parser, lexical transfer and lexical projection, STS provides real-time accurate English translations for a small but non-trivial subset of French sentences.
international multiconference on computer science and information technology | 2009
Eric Wehrli; Luka Nerima; Violeta Seretan; Yves Scherrer
Twic and TwicPen are reading aid systems for readers of material in foreign languages. Although they include a sentence translation engine, both systems are primarily conceived to give word and expression translation to readers with a basic knowledge of the language they read. Twic has been designed for on-line material and consists of a plug-in for internet browsers communicating with our server. TwicPen offers a similar assistance for readers of printed material. It consists of a hand-held scanner connected to a lap-top (or desk-top) computer running our parsing and translation software. Both systems provide readers a limited number of translations selected on the basis of a linguistic analysis of the whole scanned text fragment (a phrase, part of the sentence, etc.). The use of a morphological and syntactic parser makes it possible (i) to disambiguate to a large extent the word selected by the user (and hence to drastically reduce the noise in the response), and (ii) to handle expressions (compounds, collocations, idioms), often a major source of difficulty for non-native readers. The systems are available for the following language-pairs: English-French, French-English, German-French, German-English, Italian-French, Spanish-French. Several other pairs are under development.
conference of the european chapter of the association for computational linguistics | 1985
Eric Wehrli
This paper is concerned with the specifications and the implementation of a particular concept of word-based lexicon to be used for large natural language processing systems such as machine translation systems, and compares it with the morpheme-based conception of the lexicon traditionally assumed in computational linguistics.It will be argued that, although less concise, a relational word-based lexicon is superior to a morpheme-based lexicon from a theoretical, computational and also practical viewpoint.