Antonio L. Lagarda
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antonio L. Lagarda.
Computational Linguistics | 2009
Sergio Barrachina; Oliver Bender; Francisco Casacuberta; Jorge Civera; Elsa Cubel; Shahram Khadivi; Antonio L. Lagarda; Hermann Ney; Jesús Tomás; Enrique Vidal; Juan Miguel Vilar
Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.
north american chapter of the association for computational linguistics | 2009
Antonio L. Lagarda; Vicente Alabau; Francisco Casacuberta; Roberto Silva; Enrique Díaz-de-Liaño
Automatic post-editing (APE) systems aim at correcting the output of machine translation systems to produce better quality translations, i.e. produce translations can be manually post-edited with an increase in productivity. In this work, we present an APE system that uses statistical models to enhance a commercial rule-based machine translation (RBMT) system. In addition, a procedure for effortless human evaluation has been established. We have tested the APE system with two corpora of different complexity. For the Parliament corpus, we show that the APE system significantly complements and improves the RBMT system. Results for the Protocols corpus, although less conclusive, are promising as well. Finally, several possible sources of errors have been identified which will help develop future system enhancements.
parallel computing | 2009
Francisco Casacuberta; Jorge Civera; Elsa Cubel; Antonio L. Lagarda; Guy Lapalme; Elliott Macklovitch; Enrique Vidal
Introduction Translation from a source language into a target language has become a very important activity in recent years, both in official institutions (such as the United Nations and the EU, or in the parliaments of multilingual countries like Canada and Spain), as well as in the private sector (for example, to translate users manuals or newspapers articles). Prestigious clients such as these cannot make do with approximate translations; for all kinds of reasons, ranging from the legal obligations to good marketing practice, they require target-language texts of the highest quality. The task of producing such high-quality translations is a demanding and time-consuming one that is generally conferred to expert human translators. The problem is that, with growing globalization, the demand for high-quality translation has been steadily increasing, to the point where there are just not enough qualified translators available today to satisfy it. This has dramatically raised the need for improved machine translation (MT) technologies. The field of MT has undergone something of a revolution over the last 15 years, with the adoption of empirical, data-driven techniques originally inspired by the success of automatic speech recognition. Given the requisite corpora, it is now possible to develop new MT systems in a fraction of the time and with much less effort than was previously required under the formerly dominant rule-based paradigm. As for the quality of the translations produced by this new generation of MT systems, there has also been considerable progress; generally speaking, however, it remains well below that of human translation. No one would seriously consider directly using the output of even the best of these systems to translate a CV or a corporate Web site, for example, without submitting the machine translation to a careful human revision. As a result, those who require publication-quality translation are forced to make a diffcult choice between systems that are fully automatic but whose output must be attentively post-edited, and computer-assisted translation systems (or CAT tools for short) that allow for high quality but to the detriment of full automation. Currently, the best known CAT tools are translation memory (TM) systems. These systems recycle sentences that have previously been translated, either within the current document or earlier in other documents. This is very useful for highly repetitive texts, but not of much help for the vast majority of texts composed of original materials. Since TM systems were first introduced, very few other types of CAT tools have been forthcoming. Notable exceptions are the TransType system and its successor TransType2 (TT2). These systems represent a novel rework-ing of the old idea of interactive machine translation (IMT). Initial efforts on TransType are described in detail in Foster; suffice it to say here the systems principal novelty lies in the fact the human-machine interaction focuses on the drafting of the target text, rather than on the disambiguation of the source text, as in all former IMT systems. In the TT2 project, this idea was further developed. A full-fledged MT engine was embedded in an interactive editing environment and used to generate suggested completions of each target sentence being translated. These completions may be accepted or amended by the translator; but once validated, they are exploited by the MT engine to produce further, hopefully improved suggestions. This is in marked contrast with traditional MT, where typically the system is first used to produce a complete draft translation of a source text, which is then post-edited (corrected) offline by a human translator. TT2s interactive approach offers a significant advantage over traditional post-editing. In the latter paradigm, there is no way for the system, which is off-line, to benefit from the users corrections; in TransType, just the opposite is true. As soon as the user begins to revise an incorrect segment, the system immediately responds to that new information by proposing an alternative completion to the target segment, which is compatible with the prefix that the user has input. Another notable feature of the work described in this article is the importance accorded to a formal treatment of human-machine interaction, something that is seldom considered in the now-prevalent framework of statistical pattern recognition.
Lecture Notes in Computer Science | 2004
Jorge Civera; Juan Miguel Vilar; Elsa Cubel; Antonio L. Lagarda; Sergio Barrachina; Francisco Casacuberta; Enrique Vidal; David Picó; Jorge González
It is a fact that current methodologies for automatic translation cannot be expected to produce high quality translations. An alternative approach is to use them as an aid to manual translation. We focus on a possible way to help human translators: to interactively provide completions for the parts of the sentences already translated. We explain how finite state transducers can be used for this task and show experiments in which the keystrokes needed to translate printer manuals were reduced to nearly 25% of the original.
Computer Speech & Language | 2015
Antonio L. Lagarda; Daniel Ortiz-Martínez; Vicent Alabau; Francisco Casacuberta
HighlightsWe present a method to customize machine translation systems when in-domain data is not available.For that we perform an online learning automatic post-editing from ready-to-use generic machine translation systems.The results show that the method is very effective on rule-based machine translation systems.On statistical machine translation systems the method performs well if no in-domain data was used in the training.Finally, if there is not enough repetition our method has limited use. Globalization has dramatically increased the need of translating information from one language to another. Frequently, such translation needs should be satisfied under very tight time constraints. Machine translation (MT) techniques can constitute a solution to this overly complex problem. However, the documents to be translated in real scenarios are often limited to a specific domain, such as a particular type of medical or legal text. This situation seriously hinders the applicability of MT, since it is usually expensive to build a reliable translation system, no matter what technology is used, due to the linguistic resources that are required to build them, such as dictionaries, translation memories or parallel texts. In order to solve this problem, we propose the application of automatic post-editing in an online learning framework. Our proposed technique allows the human expert to translate in a specific domain by using a base translation system designed to work in a general domain whose output is corrected (or adapted to the specific domain) by means of an automatic post-editing module. This automatic post-editing module learns to make its corrections from user feedback in real time by means of online learning techniques. We have validated our system using different translation technologies to implement the base translation system, as well as several texts involving different domains and languages. In most cases, our results show significant improvements in terms of BLEU (up to 16 points) with respect to the baseline systems. The proposed technique works effectively when the n-grams of the document to be translated presents a certain rate of repetition, situation which is common according to the document-internal repetition property.
Pattern Recognition Letters | 2014
Vicent Alabau; Carlos D. Martínez-Hinarejos; Verónica Romero; Antonio L. Lagarda
The transcription of historical documents is one of the most interesting tasks in which Handwritten Text Recognition can be applied, due to its interest in humanities research. One alternative for transcribing the ancient manuscripts is the use of speech dictation by using Automatic Speech Recognition techniques. In the two alternatives similar models (Hidden Markov Models and n-grams) and decoding processes (Viterbi decoding) are employed, which allows a possible combination of the two modalities with little difficulties. In this work, we explore the possibility of using recognition results of one modality to restrict the decoding process of the other modality, and apply this process iteratively. Results of these multimodal iterative alternatives are significantly better than the baseline uni-modal systems and better than the non-iterative alternatives.
finite state methods and natural language processing | 2005
Jorge Civera; Juan Miguel Vilar; Elsa Cubel; Antonio L. Lagarda; Sergio Barrachina; Francisco Casacuberta; Enrique Vidal
Computer-Assisted Translation (CAT) is an alternative approach to Machine Translation, that integrates human expertise into the automatic translation process. In this framework, a human translator interacts with a translation system that dynamically offers a list of translations that best completes the part of the sentence already translated. Stochastic finite-state transducer technology is proposed to support this CAT system. The system was assessed on two real tasks of different complexity in several languages.
empirical methods in natural language processing | 2004
Jorge Civera; Elsa Cubel; Antonio L. Lagarda; David Picó; Jorge González; Enrique Vidal; Francisco Casacuberta; Juan Miguel Vilar; Sergio Barrachina
european conference on artificial intelligence | 2004
Elsa Cubel; Jorge Civera; Juan Miguel Vilar; Antonio L. Lagarda; Francisco Casacuberta; Enrique Vidal; David Picó; Jorge González; Luis Javier Rodríguez
Archive | 2003
Elsa Cubel; Jorge González; Antonio L. Lagarda; Francisco Casacuberta; Alfons Juan; Enrique Vidal