Radu Ion
Romanian Academy
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Radu Ion.
international conference on computational linguistics | 2004
Dan Tufiş; Radu Ion; Nancy Ide
The paper presents a method for word sense disambiguation based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in the corpus. The wordnets are aligned to the Princeton Wordnet, according to the principles established by EuroWordNet. The evaluation of the WSD system, implementing the method described herein showed very encouraging results. The same system used in a validation mode, can be used to check and spot alignment errors in multilingually aligned wordnets as BalkaNet and EuroWordNet.
Computers and The Humanities | 2004
Dan Tufis; Ana Maria Barbu; Radu Ion
The paper describes our recent developments in automatic extraction of translation equivalents from parallel corpora. We describe three increasingly complex algorithms: a simple baseline iterative method, and two non-iterative more elaborated versions. While the baseline algorithm is mainly described for illustrative purposes, the non-iterative algorithms outline the use of different working hypotheses which may be motivated by different kinds of applications and to some extent by the languages concerned. The first two algorithms rely on cross-lingual POS preservation, while with the third one POS invariance is not an extraction condition. The evaluation of the algorithms was conducted on three different corpora and several pairs of languages.
cross language evaluation forum | 2009
Radu Ion; Dan Ştefănescu; Alexandru Ceauşu; Dan Tufis; Elena Irimia; Verginica Barbu Mititelu
This paper reports on the construction and testing of a new Question Answering (QA) system, implemented as an workflow which builds on several web services developed at the Research Institute for Artificial Intelligence (RACAI).The evaluation of the system has been independently done by the organizers of the Romanian-Romanian task of the ResPubliQA 2009 exercise and has been rated the best performing system with the highest improvement due to the NLP technology over a baseline state-of-the-art IR system. We describe a principled way of combining different relevance measures for obtaining a general relevance (to the users question) score that will serve as the sort key for the returned paragraphs. The system was trained on a specific corpus, but its functionality is independent on the linguistic register of the training data. The trained QA system that participated in the ResPubliQA shared task is available as a web application at http://www2.racai.ro/sir-resdec/.
meeting of the association for computational linguistics | 2005
Dan Tufis; Radu Ion; Alexandru Ceauşu; Dan Stefanescu
We briefly describe a word alignment system that combines two different methods in bitext correspondences identification. The first one is a hypotheses testing approach (Gale and Church, 1991; Melamed, 2001; Tufis 2002) while the second one is closer to a model estimating approach (Brown et al., 1993; Och and Ney, 2000). We show that combining the two aligners the results are significantly improved as compared to each individual aligner.
meeting of the association for computational linguistics | 2007
Radu Ion; Dan Tufis
This article introduces an unsupervised word sense disambiguation algorithm that is inspired by the lexical attraction models of Yuret (1998). It is based on the assumption that the meanings of the words that form a sentence can be best assigned by constructing an interpretation of the whole sentence. This interpretation is facilitated by a dependency-like context specification of a content word within the sentence. Thus, finding the context words of a target word is a matter of finding a pseudo-syntactic dependency analysis of the sentence, called a linkage.
north american chapter of the association for computational linguistics | 2003
Dan Tufis; Ana-Maria Barbu; Radu Ion
We provide a rather informal presentation of a prototype system for word alignment based on our previous translation equivalence approach, discuss the problems encountered in the shared-task on word-aligning of a parallel Romanian-English text, present the preliminary evaluation results and suggest further ways of improving the alignment accuracy.
language and technology conference | 2009
Radu Ion; Dan Ştefănescu
This paper presents an unsupervised word sense disambiguation. (WSD) algorithm that makes use of lexical chains concept [6] to quantify the degree of semantic relatedness between two words. Essentially, the WSD algorithm will try to maximize this semantic measure over a graph of content words in a given sentence in order to perform the disambiguation.
Archive | 2013
Tiberiu Boroș; Dan Ștefănescu; Radu Ion
Given the unrestricted context for text-to-speech (TTS) synthesis and the current multilingual environment, TTS is often hampered by the presence of out-of-vocabulary (OOV) words. There are many precipitating factors for OOV words, from the use of technical terms, proper nouns, rare words that were not covered by the lexicon, and foreign words partially morphologically adapted; the latter, in fact, is a problem often confronted by non-English TTS synthesis systems. Furthermore, in order to derive natural speech from arbitrary text, all words that make up an utterance must undergo a series of complex processes such as: diacritic restoration; part-of-speech tagging; expansion to pronounceable form; syllabification; lexical stress prediction; and letter-to-sound conversion. OOV words require both automatic and trainable methods that can perform such tasks, which are usually based on a limited lexical context. The exception to this rule are those cases where part of speech and surrounding words are used as discriminative features such as in homograph disambiguation and abbreviation expansion. In this chapter we introduce the basic architecture of a generic natural language processing module in TTS synthesis, proposing data-driven solutions to various tasks, comparing our results concerning OOV words and prosody modeling with the current state-of-the-art TTS synthesis systems.
cross language evaluation forum | 2006
Georgiana Puşcaşu; Adrian Iftene; Ionuţ Pistol; Diana Trandabăţ; Dan Tufis; Alin Ceauşu; Dan Ştefănescu; Radu Ion; Iustin Dornescu; Alex Moruz; Dan Cristea
This paper describes the development of a Question Answering (QA) system and its evaluation results in the Romanian-English cross-lingual track organized as part of the CLEF 2006 campaign. The development stages of the cross-lingual Question Answering system are described incrementally throughout the paper, at the same time pinpointing the problems that occurred and the way they were addressed. The system adheres to the classical architecture for QA systems, debuting with question processing followed, after term translation, by information retrieval and answer extraction. Besides the common QA difficulties, the track posed some specific problems, such as the lack of a reliable translation engine from Romanian into English, and the need to evaluate each module individually for a better insight into the systems failures.
International Journal of Speech Technology | 2009
Radu Ion; Dan Tufis
This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and even more difficult when the sense distinctions are not of the same granularity. However, as we used the same sense inventory, the performance of the two WSD systems can be objectively compared and we bring evidence that multilingual WSD is more precise than monolingual WSD.