Is this you? Create Your Porfile

José B. Mariño

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José B. Mariño is active.

Explore More

Publication

Featured researches published by José B. Mariño.

Machine Translation | 2006

Improving statistical MT by coupling reordering and decoding

Josep Maria Crego; José B. Mariño

In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the n-gram translation model is also employed as distortion model. The reordering search problem is tackled through a set of linguistically motivated rewrite rules, which are used to extend a monotonic search graph with reordering hypotheses. The extended graph is traversed in the global search when a fully informed decision can be taken. Further experiments show that the n-gram translation model can be successfully used as reordering model when estimated with reordered source words. Experiments are reported on the Europarl task (Spanish–English and English–Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality with respect to monotonic search for both translation directions at a very low computational cost.

language resources and evaluation | 2005

Guidelines for Word Alignment Evaluation and Manual Alignment

Patrik Lambert; Adrià de Gispert; Rafael E. Banchs; José B. Mariño

The purpose of this paper is to provide guidelines for building a word alignment evaluation scheme. The notion of word alignment quality depends on the application: here we review standard scoring metrics for full text alignment and give explanations on how to use them better. We discuss strategies to build a reference corpus, and show that the ratio between ambiguous and unambiguous links in the reference has a great impact on scores measured with these metrics. In particular, automatically computed alignments with higher precision or higher recall can be favoured depending on the value of this ratio. Finally, we suggest a strategy to build a reference corpus particularly adapted to applications where recall plays a significant role, like in machine translation. The manually aligned corpus we built for the Spanish-English European Parliament corpus is also described. This corpus is freely available.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

System Combination for Machine Translation of Spoken and Written Language

Gregor Leusch; Rafael E. Banchs; Nicola Bertoldi; Daniel Déchelotte; Marcello Federico; Muntsin Kolss; Young-Suk Lee; José B. Mariño; Matthias Paulik; Salim Roukos; Holger Schwenk; Hermann Ney

This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different project partners were combined in the experiments. Significant improvements in translation quality from Spanish to English and from English to Spanish in comparison with the best of the individual MT systems were achieved under official evaluation conditions.

The Prague Bulletin of Mathematical Linguistics | 2011

Ncode: an Open Source Bilingual N-gram SMT Toolkit

Josep Maria Crego; François Yvon; José B. Mariño

Ncode: an Open Source Bilingual N-gram SMT Toolkit This paper describes Ncode, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review the main features of the toolkit and explain how to build a translation engine with Ncode. We also report a short comparison with the widely known Moses system. Results show that Ncode outperforms Moses in terms of memory requirements and translation speed. Ncode also achieves slightly higher accuracy results.

workshop on statistical machine translation | 2006

Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output

Maja Popović; Adrià de Gispert; Deepa Gupta; Patrik Lambert; Hermann Ney; José B. Mariño; Marcello Federico; Rafael E. Banchs

Evaluation of machine translation output is an important but difficult task. Over the last years, a variety of automatic evaluation measures have been studied, some of them like Word Error Rate (WER), Position Independent Word Error Rate (PER) and BLEU and NIST scores have become widely used tools for comparing different systems as well as for evaluating improvements within one system. However, these measures do not give any details about the nature of translation errors. Therefore some analysis of the generated output is needed in order to identify the main problems and to focus the research efforts. On the other hand, human evaluation is a time consuming and expensive task. In this paper, we investigate methods for using of morpho-syntactic information for automatic evaluation: standard error measures WER and PER are calculated on distinct word classes and forms in order to get a better idea about the nature of translation errors and possibilities for improvements.

Speech Communication | 2008

On the impact of morphology in English to Spanish statistical MT

A. de Gispert; José B. Mariño

This paper presents a thorough study of the impact of morphology derivation on N-gram-based Statistical Machine Translation (SMT) models from English into a morphology-rich language such as Spanish. For this purpose, we define a framework under the assumption that a certain degree of morphology-related information is not only being ignored by current statistical translation models, but also has a negative impact on their estimation due to the data sparseness it causes. Moreover, we describe how this information can be decoupled from the standard bilingual N-gram models and introduced separately by means of a well-defined and better informed feature-based classification task. Results are presented for the European Parliament Plenary Sessions (EPPS) English->Spanish task, showing oracle scores based on to what extent SMT models can benefit from simplifying Spanish morphological surface forms for each Part-Of-Speech category. We show that verb form morphological richness greatly weakens the standard statistical models, and we carry out a posterior morphology classification by defining a simple set of features and applying machine learning techniques. In addition to that, we propose a simple technique to deal with Spanish enclitic pronouns. Both techniques are empirically evaluated and final translation results show improvements over the baseline by just dealing with Spanish morphology. In principle, the study is also valid for translation from English into any other Romance language (Portuguese, Catalan, French, Galician, Italian, etc.). The proposed method can be applied to both monotonic and non-monotonic decoding scenarios, thus revealing the interaction between word-order decoding and the proposed morphology simplification techniques. Overall results achieve statistically significant improvement over baseline performance in this demanding task.

Speech Communication | 2000

The demiphone: an efficient contextual subword unit for continuous speech recognition

José B. Mariño; Albino Nogueiras; Pau Pachès-Leal; Antonio Bonafonte

In this paper, we introduce the demiphone as a context-dependent phonetic unit for continuous speech recognition. A phoneme is divided into two parts: a left demiphone that accounts for the left coarticulation and a right demiphone that copes with the right-hand side context. This unit discards the dependence between the effects of both side contexts, but it models the transition between phonemes as the triphone does. By concatenating a left demiphone and a right demiphone a triphone can be built, although the left and the right-context coarticulations are modeled independently. The main appeal of this unit stems from its reduced number (respect to the number of triphones) and its capability to model left and right contexts unseen together in the training material. Thus, the demiphone shares in a simple way the advantages of a smoothed parameter estimation with the ability of generalization. In the present work, the demiphone is motivated and experimentally supported. Furthermore, demiphones are compared with triphones smoothed and generalized by decision-tree state-tying, accepted as the most powerful tool for coarticulation modeling at the present state of the art. The main conclusion of our work is that the demiphone simplifies the recognition system and yields a better performance than the triphone, at least for small or moderate size databases. This result may be explained by the ability of the demiphone to provide an excellent trade-off between a detailed coarticulation modeling and a proper parameter estimation.

Speech Communication | 1997

Speech recognition in a noisy car environment based on LP of the one-sided autocorrelation sequence and robust similarity measuring techniques

Javier Hernando; Climent Nadeu; José B. Mariño

The performance of the existing speech recognition systems degrades rapidly in the presence of background noise. A novel representation of the speech signal, which is based on Linear Prediction of the One-Sided Autocorrelation sequence (OSALPC), has shown to be attractive for noisy speech recognition because of both its high recognition performance with respect to the conventional LPC in severe conditions of additive white noise and its computational simplicity. The aim of this work is twofold: (1) to show that OSALPC also achieves a good performance in a case of real noisy speech (in a car environment), and (2) to explore its combination with several robust similarity measuring techniques, showing that its performance improves by using cepstral liftering, dynamic features and multilabeling.

international conference on spoken language processing | 1996

Language modeling using x-grams

Antonio Bonafonte; José B. Mariño

In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory of the model (n) is not fixed a priori. Instead, large memories are accepted first, and merging criteria are then applied to reduce the complexity and to ensure reliable estimations. The results show how the perplexity obtained with x-grams is smaller than that of n-grams. Furthermore, the complexity is smaller than trigrams and can become close to bigrams.

spoken language technology workshop | 2006

REORDERING EXPERIMENTS FOR N-GRAM-BASED SMT

Josep Maria Crego; José B. Mariño

This paper addresses the problem of reordering in statistical machine translation (SMT). We describe an elegant and efficient approach to couple reordering (word order monotonization) and decoding, which does not need for any additional model. We use linguistically motivated reordering rules to extend a monotonic search graph (with reordering hypotheses). The extended graph is traversed in decoding when a fully- informed decision can be taken (no preprocessing decision about reordering is taken). We also show how the N-gram translation model can be successfully used as reordering model when estimated with reordered source words (to harmonize the source and target word order). Experiments are reported on the Euparl task (Spanish- to-English and English-to-Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality for both translation directions at a very low computational cost.

Explore More