Alfons Juan
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alfons Juan.
international conference on pattern recognition | 2000
Alfons Juan; Enrique Vidal
Classification based on nearest neighbours (NN) is a uniformly good approach to many pattern recognition tasks. However, two important aspects need to be taken into account to actually achieve good performance in practice: 1) the metric or dissimilarity measure adopted to compare the considered patterns; and 2) the computational cost incurred by the NN searching operation. As it is shown in this paper, by using adequate techniques to cope with these two issues, the NN-based classification leads to better results than those obtained by other approaches that have been applied to a task of human banded chromosomes classification.
Pattern Recognition Letters | 2014
Adrià Giménez; Ihab Khoury; Jesús Andrés-Ferrer; Alfons Juan
Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, where state-conditional probability density functions in each HMM are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of features should be used and, indeed, very different features sets are in use today. Among them, we have recently proposed to directly use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to by-pass feature extraction and to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this work, column bit vectors are extended by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image. Using these windowed Bernoulli mixture HMMs, good results are reported on the well-known IAM and RIMES databases of Latin script, and in particular, state-of-the-art results are provided on the IfN/ENIT database of Arabic handwritten words.
intelligent user interfaces | 2012
Isaias Sanchez-Cortina; Nicolás Serrano; Alberto Sanchis; Alfons Juan
A system to transcribe speech data is presented following an interactive paradigm in which both, the system produces automatically speech transcriptions and the user is assisted by the system to amend output errors as efficiently as possible. Partially supervised transcriptions with a tolerance error fixed by the user are used to incrementally adapt the underlying system models. The prototype uses a simple yet effective method to find an optimal balance between recognition error and supervision effort.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Alberto Sanchis; Alfons Juan; Enrique Vidal
Confidence estimation has been largely used in speech recognition to detect words in the recognized sentence that have been likely misrecognized. Confidence estimation can be seen as a conventional pattern classification problem in which a set of features is obtained for each hypothesized word in order to classify it as either correct or incorrect. We propose a smoothed naïve Bayes classification model to profitably combine these features. The model itself is a combination of word-dependent (specific) and word-independent (generalized) naïve Bayes models. As in statistical language modeling, the purpose of the generalized model is to smooth the (class posterior) estimates given by the specific models. Our classification model is empirically compared with confidence estimation based on posterior probabilities computed on word graphs. Empirical results clearly show that the good performance of word graph-based posterior probabilities can be improved by using the naïve Bayes combination of features.
international conference on acoustics, speech, and signal processing | 2013
Adria A. Martinez-Villaronga; Miguel A. del Agua; Jesús Andrés-Ferrer; Alfons Juan
Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.
international conference on image analysis and processing | 2009
Lionel Tarazón; Daniel Pérez; Nicolás Serrano; Vicente Alabau; Oriol Ramos Terrades; Alberto Sanchis; Alfons Juan
An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.
International Journal on Document Analysis and Recognition | 2014
Nicolás Serrano; Adrià Giménez; Jorge Civera; Alberto Sanchis; Alfons Juan
Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. Although post-editing automatic recognition of handwritten text is feasible, it is not clearly better than simply ignoring it and transcribing the document from scratch. A more effective approach is to follow an interactive approach in which both the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Nevertheless, in some applications, the user effort available to transcribe documents is limited and fully supervision of the system output is not realistic. To circumvent these problems, we propose a novel interactive approach which efficiently employs user effort to transcribe a document by improving three different aspects. Firstly, the system employs a limited amount of effort to solely supervise recognised words that are likely to be incorrect. Thus, user effort is efficiently focused on the supervision of words for which the system is not confident enough. Secondly, it refines the initial transcription provided to the user by recomputing it constrained to user supervisions. In this way, incorrect words in unsupervised parts can be automatically amended without user supervision. Finally, it improves the underlying system models by retraining the system from partially supervised transcriptions. In order to prove these statements, empirical results are presented on two real databases showing that the proposed approach can notably reduce user effort in the transcription of handwritten text in (old) documents.
international conference on frontiers in handwriting recognition | 2012
Patrick Doetsch; Mahdi Hamdani; Hermann Ney; Adrià Giménez; Jesús Andrés-Ferrer; Alfons Juan
In this paper a vertical repositioning method based on the center of gravity is investigated for handwriting recognition systems and evaluated on databases containing Arabic and French handwriting. Experiments show that vertical distortion in images has a large impact on the performance of HMM based handwriting recognition systems. Recently good results were obtained with Bernoulli HMMs (BHMMs) using a preprocessing with vertical repositioning of binarized images. In order to isolate the effect of the preprocessing from the BHMM model, experiments were conducted with Gaussian HMMs and the LSTM-RNN tandem HMM approach with relative improvements of 33% WER on the Arabic and up to 62% on the French database.
Archive | 2012
Ihab Alkhoury; Adrià Giménez; Alfons Juan
Hidden Markov models (HMMs) are now widely used for off-line handwriting recognition in many languages and, in particular, in Arabic. As in speech recognition, they are usually built from shared, embedded HMMs at the symbol level, in which state-conditional probability density functions are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kinds of features should be used and, indeed, very different feature sets are in use today. Among them, we have recently proposed to simply use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to bypass feature extraction and ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this chapter, we review this idea along with some extensions that are currently providing state-of-the-art results on Arabic handwritten word recognition.
international conference on document analysis and recognition | 2011
Adrià Giménez; Jesús Andrés-Ferrer; Alfons Juan; Nicolás Serrano
Bernoulli-based models such as Bernoulli mixtures or Bernoulli HMMs (BHMMs), have been successfully applied to several handwritten text recognition (HTR) tasks which range from character recognition to continuous and isolated handwritten words. All these models belong to the generative model family and, hence, are usually trained by (joint) maximum likelihood estimation (MLE). Despite the good properties of the MLE criterion, there are better training criteria such as maximum mutual information (MMI). The MMI is a widespread criterion that is mainly employed to train discriminative models such as log-linear (or maximum entropy) models. Inspired by the Bernoulli mixture classifier, in this work a log-linear model for binary data is proposed, the so-called mixture of multi-class logistic regression. The proposed model is proved to be equivalent to the Bernoulli mixture classifier. In this way, we give a discriminative training framework for Bernoulli mixture models. The proposed discriminative training framework is applied to a well-known Indian digit recognition task.