Marta Ruiz Costa-Jussà
Media Research Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marta Ruiz Costa-Jussà.
international conference on asian language processing | 2010
Rejwanul Haque; Sudip Kumar Naskar; Andy Way; Marta Ruiz Costa-Jussà; Rafael E. Banchs
Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation(PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of super tags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with super tag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a super tag-based feature.
Archive | 2011
Marta Ruiz Costa-Jussà; Rafael E. Banchs
In this chapter, we focus on the specific problem of sentence alignment given two comparable corpora. This task is essential to some specific applications such as parallel corpora compilation Utiyama & Tanimura (2007) and cross-language plagiarism detection Potthast et al. (2009). We address this problem by means of a cross-language information retrieval (CLIR) system. CLIR deals with the problem of finding relevant documents in a language different from the one used in the query. Different strategies are used, from ontology based Soerfel (2002) to statistical tools. Latent Semantic Analysis can be used to get a list of parallel words Codina et al. (2008). Multidimensional Scaling projections Banchs & Costa-jussa (2009) can also be used in order to find similar documents in a cross-lingual environment. Other techniques are based on machine translation, where the search is performed over translated texts Kishida (2005). Within this framework, two basic components should be distinguished: a translation model, and a retrieval model that may work as in the monolingual case. The translation can be faced either in the query, or in the document. In the case of document translation, statistical machine translation systems can be used for translating document collections into the original query language. In the case of query translation, the challenges of deciding how a term might be written in another language, which of the possible translations should be retained, and how to weight the importance of translation alternatives when more than one translation is retained should be considered. Here, we use the query translation approach. Then, a segment of text in a given source language is used as query for recovering a similar or equivalent segment of text in a different target language. Given that we are using complete sentences which provide a certain context for the terms to be translated, we do not have the disadvantages mentioned in the above lines. Particularly, when using the query translation approach, we investigate if using either a rule-based or a statitical-based machine translation system influence the final quality of the sentence alignment. Additionally, we test if standard automatic MT metrics are correlated with the standards metrics of the sentence alignment. Rule-based machine translation (RBMT) systems were the first commercial machine translation systems. Much more complex than translating word to word, these systems develop linguistic rules that allow the words to be put in different places, to have different meaning depending on context, etc. RBMT technology applies a set of linguistic rules in three 2
14th Annual Conference of the European Association for Machine Translation | 2010
Mireia Farrús Cabeceran; Marta Ruiz Costa-Jussà; José Bernardo Mariño Acebal; José A. R. Fonollosa
IWSLT | 2005
Josep Maria Crego; Marta Ruiz Costa-Jussà; José B. Mariño; José A. R. Fonollosa
language resources and evaluation | 2011
Marta Ruiz Costa-Jussà; José A. R. Fonollosa
Notebook Papers of CLEF 2010 Labs and Workshops, 22-23 September, Padua, Italy, September 2010 | 2010
Marta Ruiz Costa-Jussà; Rafael E. Banchs; Jens Grivolla; Joan Codina
IWSLT | 2005
Marta Ruiz Costa-Jussà; José A. R. Fonollosa
EAMT 2010: proceedings of the 14th annual conference of the European Association for Machine Translation | 2010
Marta Ruiz Costa-Jussà; Vidas Daudaravicius; Rafael E. Banchs
Proceedings of IWSLT 2010, Paris, France | 2010
Carlos A Henriquez; Marta Ruiz Costa-Jussà; Vidas Daudaravicius; Rafael E. Banchs; José B. Mariño
Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages | 2009
Marc Poch; Mireia Farrús Cabeceran; Marta Ruiz Costa-Jussà; José Bernardo Mariño Acebal; Adolfo Hernández; Carlos Alberto Henríquez Quintana; José A. R. Fonollosa