Alberto Sanchis
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alberto Sanchis.
Computer Speech & Language | 2004
Francisco Casacuberta; Hermann Ney; Franz Josef Och; Enrique Vidal; Juan Miguel Vilar; Sergio Barrachina; I. Garcı́a-Varea; D. Llorens; César Martínez; Sirko Molau; Francisco Nevado; Moisés Pastor; David Picó; Alberto Sanchis; C. Tillmann
Abstract Speech-input translation can be properly approached as a pattern recognition problem by means of statistical alignment models and stochastic finite-state transducers. Under this general framework, some specific models are presented. One of the features of such models is their capability of automatically learning from training examples. Moreover, the stochastic finite-state transducers permit an integrated architecture similar to one used in speech recognition. In this case, the acoustic models (hidden Markov models) are embedded into the finite-state transducers, and the translation of a source utterance is the result of a (Viterbi) search on the integrated network. These approaches have been followed in the framework of the European project E u T rans . Translation experiments have been performed from Spanish to English and from Italian to English in an application involving the interaction of a customer with a receptionist at the frontdesk of a hotel.
international conference on acoustics, speech, and signal processing | 2001
Francisco Casacuberta; David Llorens; Carlos Martinez; Sirko Molau; Francisco Nevado; Hermann Ney; Moisés Pastor; David Picó; Alberto Sanchis; Enrique Vidal; Juan Miguel Vilar
Nowadays, the most successful speech recognition systems are based on stochastic finite-state networks (hidden Markov models and n-grams). Speech translation can be accomplished in a similar way as speech recognition. Stochastic finite-state transducers, which are specific stochastic finite-state networks, have proved very adequate for translation modeling. In this work a speech-to-speech translation system, the EuTRANS system, is presented. The acoustic, language and translation models are finite-state networks that are automatically learnt from training samples. This system was assessed in a series of translation experiments from Spanish to English and from Italian to English in an application involving the interaction (by telephone) of a customer with a receptionist at the front-desk of a hotel.
intelligent user interfaces | 2010
Nicolás Serrano; Alberto Sanchis; Alfons Juan
An effective approach to transcribe handwritten text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. This approach has been recently implemented in a system prototype called GIDOC, in which standard speech technology is adapted to handwritten text (line) images: HMM-based text image modeling, n-gram language modeling, and also confidence measures on recognized words. Confidence measures are used to assist the user in locating possible transcription errors, and thus validate system output after only supervising those (few) words for which the system is not highly confident. However, a certain degree of supervision is required for proper model adaptation from partially supervised transcriptions. Here, we propose a simple yet effective method to find an optimal balance between recognition error and supervision effort.
intelligent user interfaces | 2012
Isaias Sanchez-Cortina; Nicolás Serrano; Alberto Sanchis; Alfons Juan
A system to transcribe speech data is presented following an interactive paradigm in which both, the system produces automatically speech transcriptions and the user is assisted by the system to amend output errors as efficiently as possible. Partially supervised transcriptions with a tolerance error fixed by the user are used to incrementally adapt the underlying system models. The prototype uses a simple yet effective method to find an optimal balance between recognition error and supervision effort.
international conference on multimodal interfaces | 2009
Nicolás Serrano; Daniel Pérez; Alberto Sanchis; Alfons Juan
An effective approach to transcribe handwritten text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. This approach has been recently implemented in a system prototype called GIDOC, in which standard speech technology is adapted to handwritten text (line) images: HMM-based text image modelling, n-gram language modelling, and also confidence measures on recognized words. Confidence measures are used to assist the user in locating possible transcription errors, and thus validate system output after only supervising those (few) words for which the system is not highly confident. Here, we study the effect of using these partially supervised transcriptions on the adaptation of image and language models to the task.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Alberto Sanchis; Alfons Juan; Enrique Vidal
Confidence estimation has been largely used in speech recognition to detect words in the recognized sentence that have been likely misrecognized. Confidence estimation can be seen as a conventional pattern classification problem in which a set of features is obtained for each hypothesized word in order to classify it as either correct or incorrect. We propose a smoothed naïve Bayes classification model to profitably combine these features. The model itself is a combination of word-dependent (specific) and word-independent (generalized) naïve Bayes models. As in statistical language modeling, the purpose of the generalized model is to smooth the (class posterior) estimates given by the specific models. Our classification model is empirically compared with confidence estimation based on posterior probabilities computed on word graphs. Empirical results clearly show that the good performance of word graph-based posterior probabilities can be improved by using the naïve Bayes combination of features.
international conference on acoustics, speech, and signal processing | 2003
Alberto Sanchis; Alfons Juan; Enrique Vidal
Utterance verification can be seen as a conventional pattern classification problem in which a feature vector is obtained for each hypothesized word in order to classify it as either correct or incorrect. It is unclear, however, which predictor (pattern) features and classification model should be used. Regarding the features, we have proposed a new feature, called word trellis stability (WTS), that can be profitably used in conjunction with more or less standard features such as acoustic stability. This is confirmed in this paper, where a smoothed naive Bayes classification model is proposed to adequately combine predictor features. On a series of experiments with this classification model and several features, we have found that the results provided by each feature alone are outperformed by certain combinations. In particular, the combination of the two above-mentioned features has been consistently found to give the most accurate result in two verification tasks.
Pattern Recognition | 2014
Vicent Alabau; Alberto Sanchis; Francisco Casacuberta
On-line handwriting text recognition (HTR) could be used as a more natural way of interaction in many interactive applications. However, current HTR technology is far from developing error-free systems and, consequently, its use in many applications is limited. Despite this, there are many scenarios, as in the correction of the errors of fully-automatic systems using HTR in a post-editing step, in which the information from the specific task allows to constrain the search and therefore to improve the HTR accuracy. For example, in machine translation (MT), the on-line HTR system can also be used to correct translation errors. The HTR can take advantage of information from the translation problem such as the source sentence that is translated, the portion of the translated sentence that has been supervised by the human, or the translation error to be amended. Empirical experimentation suggests that this is a valuable information to improve the robustness of the on-line HTR system achieving remarkable results. Graphical abstractThis work presents an e-pen enabled system where handwriting is used to amend the errors of a machine translation system. Handwriting recognition is performed in such a way that the contextual information (source, prefix, translation, and error) is integrated to improve the final recognition accuracy.Display Omitted HighlightsWe present a specific on-line HTR system for editing machine translation (MT) output.We leverage information from different sources in MT to constrain the HTR search.All the proposed systems outperform the baseline.The use of information from the translation models achieves remarkable results.Finally, we propose a system to amend HTR errors with a 75% typing effort reduction.
international conference on image analysis and processing | 2009
Lionel Tarazón; Daniel Pérez; Nicolás Serrano; Vicente Alabau; Oriol Ramos Terrades; Alberto Sanchis; Alfons Juan
An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.
Machine Learning | 2001
Juan-Carlos Amengual; Alberto Sanchis; Enrique Vidal; José-Miguel Benedí
In many language processing tasks, most of the sentences generally convey rather simple meanings. Moreover, these tasks have a limited semantic domain that can be properly covered with a simple lexicon and a restricted syntax. Nevertheless, casual users are by no means expected to comply with any kind of formal syntactic restrictions due to the inherent “spontaneous” nature of human language. In this work, the use of error-correcting-based learning techniques is proposed to cope with the complex syntactic variability which is generally exhibited by natural language. In our approach, a complex task is modeled in terms of a basic finite state model, F, and a stochastic error model, E. F should account for the basic (syntactic) structures underlying this task, which would convey the meaning. E should account for general vocabulary variations, word disappearance, superfluous words, and so on. Each “natural” user sentence is thus considered as a corrupted version (according to E) of some “simple” sentence of L(F). Adequate bootstrapping procedures are presented that incrementally improve the “structure” of F while estimating the probabilities for the operations of E. These techniques have been applied to a practical task of moderately high syntactic variability, and the results which show the potential of the proposed approach are presented.