Marta Ruiz Costa-Jussà

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marta Ruiz Costa-Jussà is active.

Explore More

Publication

Featured researches published by Marta Ruiz Costa-Jussà.

international conference on asian language processing | 2010

Sentence Similarity-Based Source Context Modelling in PBSMT

Rejwanul Haque; Sudip Kumar Naskar; Andy Way; Marta Ruiz Costa-Jussà; Rafael E. Banchs

Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation(PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of super tags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with super tag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a super tag-based feature.

Archive | 2011

Sentence Alignment by Means of Cross-Language Information Retrieval

Marta Ruiz Costa-Jussà; Rafael E. Banchs

In this chapter, we focus on the specific problem of sentence alignment given two comparable corpora. This task is essential to some specific applications such as parallel corpora compilation Utiyama & Tanimura (2007) and cross-language plagiarism detection Potthast et al. (2009). We address this problem by means of a cross-language information retrieval (CLIR) system. CLIR deals with the problem of finding relevant documents in a language different from the one used in the query. Different strategies are used, from ontology based Soerfel (2002) to statistical tools. Latent Semantic Analysis can be used to get a list of parallel words Codina et al. (2008). Multidimensional Scaling projections Banchs & Costa-jussa (2009) can also be used in order to find similar documents in a cross-lingual environment. Other techniques are based on machine translation, where the search is performed over translated texts Kishida (2005). Within this framework, two basic components should be distinguished: a translation model, and a retrieval model that may work as in the monolingual case. The translation can be faced either in the query, or in the document. In the case of document translation, statistical machine translation systems can be used for translating document collections into the original query language. In the case of query translation, the challenges of deciding how a term might be written in another language, which of the possible translations should be retained, and how to weight the importance of translation alternatives when more than one translation is retained should be considered. Here, we use the query translation approach. Then, a segment of text in a given source language is used as query for recovering a similar or equivalent segment of text in a different target language. Given that we are using complete sentences which provide a certain context for the terms to be translated, we do not have the disadvantages mentioned in the above lines. Particularly, when using the query translation approach, we investigate if using either a rule-based or a statitical-based machine translation system influence the final quality of the sentence alignment. Additionally, we test if standard automatic MT metrics are correlated with the standards metrics of the sentence alignment. Rule-based machine translation (RBMT) systems were the first commercial machine translation systems. Much more complex than translating word to word, these systems develop linguistic rules that allow the words to be put in different places, to have different meaning depending on context, etc. RBMT technology applies a set of linguistic rules in three 2

14th Annual Conference of the European Association for Machine Translation | 2010

Linguistic-based evaluation criteria to identify statistical machine translation errors

Mireia Farrús Cabeceran; Marta Ruiz Costa-Jussà; José Bernardo Mariño Acebal; José A. R. Fonollosa

IWSLT | 2005

N-gram-based versus phrase-based statistical machine translation.

Josep Maria Crego; Marta Ruiz Costa-Jussà; José B. Mariño; José A. R. Fonollosa

language resources and evaluation | 2011

Using linear interpolation and weighted reordering hypotheses in the moses system

Marta Ruiz Costa-Jussà; José A. R. Fonollosa

Notebook Papers of CLEF 2010 Labs and Workshops, 22-23 September, Padua, Italy, September 2010 | 2010

Plagiarism detection using information retrieval and similarity measures based on image processing techniques

Marta Ruiz Costa-Jussà; Rafael E. Banchs; Jens Grivolla; Joan Codina

IWSLT | 2005

Tuning a phrase-based statistical translation system for the IWSLT 2005 Chinese to English and Arabic to English tasks

Marta Ruiz Costa-Jussà; José A. R. Fonollosa

EAMT 2010: proceedings of the 14th annual conference of the European Association for Machine Translation | 2010

Integration of statistical collocation segmentations in a phrase-based statistical machine translation system

Marta Ruiz Costa-Jussà; Vidas Daudaravicius; Rafael E. Banchs

Proceedings of IWSLT 2010, Paris, France | 2010

UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system

Carlos A Henriquez; Marta Ruiz Costa-Jussà; Vidas Daudaravicius; Rafael E. Banchs; José B. Mariño

Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages | 2009

The TALP on-line Spanish-Catalan machine-translation system

Marc Poch; Mireia Farrús Cabeceran; Marta Ruiz Costa-Jussà; José Bernardo Mariño Acebal; Adolfo Hernández; Carlos Alberto Henríquez Quintana; José A. R. Fonollosa

Explore More