Is this you? Create Your Porfile

Jorge Civera

Polytechnic University of Valencia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jorge Civera is active.

Explore More

Publication

Featured researches published by Jorge Civera.

Computational Linguistics | 2009

Statistical approaches to computer-assisted translation

Sergio Barrachina; Oliver Bender; Francisco Casacuberta; Jorge Civera; Elsa Cubel; Shahram Khadivi; Antonio L. Lagarda; Hermann Ney; Jesús Tomás; Enrique Vidal; Juan Miguel Vilar

Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.

workshop on statistical machine translation | 2007

Domain Adaptation in Statistical Machine Translation with Mixture Modelling

Jorge Civera; Alfons Juan

Mixture modelling is a standard technique for density estimation, but its use in statistical machine translation (SMT) has just started to be explored. One of the main advantages of this technique is its capability to learn specific probability distributions that better fit subsets of the training dataset. This feature is even more important in SMT given the difficulties to translate polysemic terms whose semantic depends on the context in which that term appears. In this paper, we describe a mixture extension of the HMM alignment model and the derivation of Viterbi alignments to feed a state-of-the-art phrase-based system. Experiments carried out on the Europarl and News Commentary corpora show the potential interest and limitations of mixture modelling.

Journal of Algorithms | 2009

A statistical approach to crosslingual natural language tasks

David Pinto; Jorge Civera; Alberto Barrón-Cedeòo; Alfons Juan; Paolo Rosso

The existence of huge volumes of documents written in multiple languages on Internet leads to investigate novel algorithmic approaches to deal with information of this kind. However, most crosslingual natural language processing (NLP) tasks consider a decoupled approach in which monolingual NLP techniques are applied along with an independent translation process. This two-step approach is too sensitive to translation errors, and in general to the accumulative effect of errors. To solve this problem, we propose to use a direct probabilistic crosslingual NLP system which integrates both steps, translation and the specific NLP task, into a single one. In order to perform this integrated approach to crosslingual tasks, we propose to use the statistical IBM 1 word alignment model (M1). The M1 model may show a non-monotonic behaviour when aligning words from a sentence in a source language to words from another sentence in a different, target language. This is the case of languages with different word order. In English, for instance, adjectives appear before nouns, whereas in Spanish it is exactly the opposite. The successful experimental results reported in three different tasks - text classification, information retrieval and plagiarism analysis - highlight the benefits of the statistical integrated approach proposed in this work.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Computer-assisted translation using speech recognition

Enrique Vidal; Francisco Casacuberta; Luis Javier Rodríguez; Jorge Civera; Carlos D. Martinez Hinarejos

Current machine translation systems are far from being perfect. However, such systems can be used in computer-assisted translation to increase the productivity of the (human) translation process. The idea is to use a text-to-text translation system to produce portions of target language text that can be accepted or amended by a human translator using text or speech. These user-validated portions are then used by the text-to-text translation system to produce further, hopefully improved suggestions. There are different alternatives of using speech in a computer-assisted translation system: From pure dictated translation to simple determination of acceptable partial translations by reading parts of the suggestions made by the system. In all the cases, information from the text to be translated can be used to constrain the speech decoding search space. While pure dictation seems to be among the most attractive settings, unfortunately perfect speech decoding does not seem possible with the current speech processing technology and human error-correcting would still be required. Therefore, approaches that allow for higher speech recognition accuracy by using increasingly constrained models in the speech recognition process are explored here. All these approaches are presented under the statistical framework. Empirical results support the potential usefulness of using speech within the computer-assisted translation paradigm.

parallel computing | 2009

Human interaction for high-quality machine translation

Francisco Casacuberta; Jorge Civera; Elsa Cubel; Antonio L. Lagarda; Guy Lapalme; Elliott Macklovitch; Enrique Vidal

Introduction Translation from a source language into a target language has become a very important activity in recent years, both in official institutions (such as the United Nations and the EU, or in the parliaments of multilingual countries like Canada and Spain), as well as in the private sector (for example, to translate users manuals or newspapers articles). Prestigious clients such as these cannot make do with approximate translations; for all kinds of reasons, ranging from the legal obligations to good marketing practice, they require target-language texts of the highest quality. The task of producing such high-quality translations is a demanding and time-consuming one that is generally conferred to expert human translators. The problem is that, with growing globalization, the demand for high-quality translation has been steadily increasing, to the point where there are just not enough qualified translators available today to satisfy it. This has dramatically raised the need for improved machine translation (MT) technologies. The field of MT has undergone something of a revolution over the last 15 years, with the adoption of empirical, data-driven techniques originally inspired by the success of automatic speech recognition. Given the requisite corpora, it is now possible to develop new MT systems in a fraction of the time and with much less effort than was previously required under the formerly dominant rule-based paradigm. As for the quality of the translations produced by this new generation of MT systems, there has also been considerable progress; generally speaking, however, it remains well below that of human translation. No one would seriously consider directly using the output of even the best of these systems to translate a CV or a corporate Web site, for example, without submitting the machine translation to a careful human revision. As a result, those who require publication-quality translation are forced to make a diffcult choice between systems that are fully automatic but whose output must be attentively post-edited, and computer-assisted translation systems (or CAT tools for short) that allow for high quality but to the detriment of full automation. Currently, the best known CAT tools are translation memory (TM) systems. These systems recycle sentences that have previously been translated, either within the current document or earlier in other documents. This is very useful for highly repetitive texts, but not of much help for the vast majority of texts composed of original materials. Since TM systems were first introduced, very few other types of CAT tools have been forthcoming. Notable exceptions are the TransType system and its successor TransType2 (TT2). These systems represent a novel rework-ing of the old idea of interactive machine translation (IMT). Initial efforts on TransType are described in detail in Foster; suffice it to say here the systems principal novelty lies in the fact the human-machine interaction focuses on the drafting of the target text, rather than on the disambiguation of the source text, as in all former IMT systems. In the TT2 project, this idea was further developed. A full-fledged MT engine was embedded in an interactive editing environment and used to generate suggested completions of each target sentence being translated. These completions may be accepted or amended by the translator; but once validated, they are exploited by the MT engine to produce further, hopefully improved suggestions. This is in marked contrast with traditional MT, where typically the system is first used to produce a complete draft translation of a source text, which is then post-edited (corrected) offline by a human translator. TT2s interactive approach offers a significant advantage over traditional post-editing. In the latter paradigm, there is no way for the system, which is off-line, to benefit from the users corrections; in TransType, just the opposite is true. As soon as the user begins to revise an incorrect segment, the system immediately responds to that new information by proposing an alternative completion to the target segment, which is compatible with the prefix that the user has input. Another notable feature of the work described in this article is the importance accorded to a formal treatment of human-machine interaction, something that is seldom considered in the now-prevalent framework of statistical pattern recognition.

Lecture Notes in Computer Science | 2004

A Syntactic Pattern Recognition Approach to Computer Assisted Translation

Jorge Civera; Juan Miguel Vilar; Elsa Cubel; Antonio L. Lagarda; Sergio Barrachina; Francisco Casacuberta; Enrique Vidal; David Picó; Jorge González

It is a fact that current methodologies for automatic translation cannot be expected to produce high quality translations. An alternative approach is to use them as an aid to manual translation. We focus on a possible way to help human translators: to interactively provide completions for the parts of the sentences already translated. We explain how finite state transducers can be used for this task and show experiments in which the keystrokes needed to translate printer manuals were reduced to nearly 25% of the original.

empirical methods in natural language processing | 2008

Improving Interactive Machine Translation via Mouse Actions

Germán Sanchis-Trilles; Daniel Ortiz-Martínez; Jorge Civera; Francisco Casacuberta; Enrique Vidal; Hieu Hoang

Although Machine Translation (MT) is a very active research field which is receiving an increasing amount of attention from the research community, the results that current MT systems are capable of producing are still quite far away from perfection. Because of this, and in order to build systems that yield correct translations, human knowledge must be integrated into the translation process, which will be carried out in our case in an Interactive-Predictive (IP) framework. In this paper, we show that considering Mouse Actions as a significant information source for the underlying system improves the productivity of the human translator involved. In addition, we also show that the initial translations that the MT system provides can be quickly improved by an expert by only performing additional Mouse Actions. In this work, we will be using word graphs as an efficient interface between a phrase-based MT system and the IP engine.

International Journal on Document Analysis and Recognition | 2014

Interactive handwriting recognition with limited user effort

Nicolás Serrano; Adrià Giménez; Jorge Civera; Alberto Sanchis; Alfons Juan

Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. Although post-editing automatic recognition of handwritten text is feasible, it is not clearly better than simply ignoring it and transcribing the document from scratch. A more effective approach is to follow an interactive approach in which both the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Nevertheless, in some applications, the user effort available to transcribe documents is limited and fully supervision of the system output is not realistic. To circumvent these problems, we propose a novel interactive approach which efficiently employs user effort to transcribe a document by improving three different aspects. Firstly, the system employs a limited amount of effort to solely supervise recognised words that are likely to be incorrect. Thus, user effort is efficiently focused on the supervision of words for which the system is not confident enough. Secondly, it refines the initial transcription provided to the user by recomputing it constrained to user supervisions. In this way, incorrect words in unsupervised parts can be automatically amended without user supervision. Finally, it improves the underlying system models by retraining the system from partially supervised transcriptions. In order to prove these statements, empirical results are presented on two real databases showing that the proposed approach can notably reduce user effort in the transcription of handwritten text in (old) documents.

Open Learning: The Journal of Open and Distance Learning | 2014

Evaluating Intelligent Interfaces for Post-Editing Automatic Transcriptions of Online Video Lectures.

J.D. Valor Miró; R.N. Spencer; A. Pérez González de Martos; G. Garcés Díaz-Munío; Carlos Turro; Jorge Civera; Alfons Juan

Video lectures are fast becoming an everyday educational resource in higher education. They are being incorporated into existing university curricula around the world, while also emerging as a key component of the open education movement. In 2007, the Universitat Politècnica de València (UPV) implemented its poliMedia lecture capture system for the creation and publication of quality educational video content and now has a collection of over 10,000 video objects. In 2011, it embarked on the EU-subsidised transLectures project to add automatic subtitles to these videos in both Spanish and other languages. By doing so, it allows access to their educational content by non-native speakers and the deaf and hard-of-hearing, as well as enabling advanced repository management functions. In this paper, following a short introduction to poliMedia, transLectures and Docència en Xarxa (Teaching Online), the UPV’s action plan to boost the use of digital resources at the university, we will discuss the three-stage evaluation process carried out with the collaboration of UPV lecturers to find the best interaction protocol for the task of post-editing automatic subtitles.

Communications in computer and information science | 2012

Integrating a State-of-the-Art ASR System into the Opencast Matterhorn Platform

Juan Daniel Valor Miró; Alejandro Manuel Pérez González de Martos; Jorge Civera; Alfons Juan

In this paper we present the integration of a state-of-the-art ASR system into the Opencast Matterhorn platform, a free, open-source platform to support the management of educational audio and video content. The ASR system was trained on a novel large speech corpus, known as poliMedia, that was manually transcribed for the European project transLectures. This novel corpus contains more than 115 hours of transcribed speech that will be available for the research community. Initial results on the poliMedia corpus are also reported to compare the performance of different ASR systems based on the linear interpolation of language models. To this purpose, the in-domain poliMedia corpus was linearly interpolated with an external large-vocabulary dataset, the well-known Google N-Gram corpus. WER figures reported denote the notable improvement over the baseline performance as a result of incorporating the vast amount of data represented by the Google N-Gram corpus.

Explore More