Is this you? Create Your Porfile

Alfons Juan

Polytechnic University of Valencia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alfons Juan is active.

Explore More

Publication

Featured researches published by Alfons Juan.

international conference on pattern recognition | 2000

On the use of normalized edit distances and an efficient k-NN search technique (k-AESA) for fast and accurate string classification

Alfons Juan; Enrique Vidal

Classification based on nearest neighbours (NN) is a uniformly good approach to many pattern recognition tasks. However, two important aspects need to be taken into account to actually achieve good performance in practice: 1) the metric or dissimilarity measure adopted to compare the considered patterns; and 2) the computational cost incurred by the NN searching operation. As it is shown in this paper, by using adequate techniques to cope with these two issues, the NN-based classification leads to better results than those obtained by other approaches that have been applied to a task of human banded chromosomes classification.

Pattern Recognition Letters | 2014

Handwriting word recognition using windowed Bernoulli HMMs

Adrià Giménez; Ihab Khoury; Jesús Andrés-Ferrer; Alfons Juan

Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, where state-conditional probability density functions in each HMM are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of features should be used and, indeed, very different features sets are in use today. Among them, we have recently proposed to directly use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to by-pass feature extraction and to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this work, column bit vectors are extended by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image. Using these windowed Bernoulli mixture HMMs, good results are reported on the well-known IAM and RIMES databases of Latin script, and in particular, state-of-the-art results are provided on the IfN/ENIT database of Arabic handwritten words.

intelligent user interfaces | 2012

A prototype for interactive speech transcription balancing error and supervision effort

Isaias Sanchez-Cortina; Nicolás Serrano; Alberto Sanchis; Alfons Juan

A system to transcribe speech data is presented following an interactive paradigm in which both, the system produces automatically speech transcriptions and the user is assisted by the system to amend output errors as efficiently as possible. Partially supervised transcriptions with a tolerance error fixed by the user are used to incrementally adapt the underlying system models. The prototype uses a simple yet effective method to find an optimal balance between recognition error and supervision effort.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

A Word-Based Naïve Bayes Classifier for Confidence Estimation in Speech Recognition

Alberto Sanchis; Alfons Juan; Enrique Vidal

Confidence estimation has been largely used in speech recognition to detect words in the recognized sentence that have been likely misrecognized. Confidence estimation can be seen as a conventional pattern classification problem in which a set of features is obtained for each hypothesized word in order to classify it as either correct or incorrect. We propose a smoothed naïve Bayes classification model to profitably combine these features. The model itself is a combination of word-dependent (specific) and word-independent (generalized) naïve Bayes models. As in statistical language modeling, the purpose of the generalized model is to smooth the (class posterior) estimates given by the specific models. Our classification model is empirically compared with confidence estimation based on posterior probabilities computed on word graphs. Empirical results clearly show that the good performance of word graph-based posterior probabilities can be improved by using the naïve Bayes combination of features.

international conference on acoustics, speech, and signal processing | 2013

Language model adaptation for video lectures transcription

Adria A. Martinez-Villaronga; Miguel A. del Agua; Jesús Andrés-Ferrer; Alfons Juan

Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.

international conference on image analysis and processing | 2009

Confidence Measures for Error Correction in Interactive Transcription Handwritten Text

Lionel Tarazón; Daniel Pérez; Nicolás Serrano; Vicente Alabau; Oriol Ramos Terrades; Alberto Sanchis; Alfons Juan

An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.

International Journal on Document Analysis and Recognition | 2014

Interactive handwriting recognition with limited user effort

Nicolás Serrano; Adrià Giménez; Jorge Civera; Alberto Sanchis; Alfons Juan

Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. Although post-editing automatic recognition of handwritten text is feasible, it is not clearly better than simply ignoring it and transcribing the document from scratch. A more effective approach is to follow an interactive approach in which both the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Nevertheless, in some applications, the user effort available to transcribe documents is limited and fully supervision of the system output is not realistic. To circumvent these problems, we propose a novel interactive approach which efficiently employs user effort to transcribe a document by improving three different aspects. Firstly, the system employs a limited amount of effort to solely supervise recognised words that are likely to be incorrect. Thus, user effort is efficiently focused on the supervision of words for which the system is not confident enough. Secondly, it refines the initial transcription provided to the user by recomputing it constrained to user supervisions. In this way, incorrect words in unsupervised parts can be automatically amended without user supervision. Finally, it improves the underlying system models by retraining the system from partially supervised transcriptions. In order to prove these statements, empirical results are presented on two real databases showing that the proposed approach can notably reduce user effort in the transcription of handwritten text in (old) documents.

international conference on frontiers in handwriting recognition | 2012

Comparison of Bernoulli and Gaussian HMMs Using a Vertical Repositioning Technique for Off-Line Handwriting Recognition

Patrick Doetsch; Mahdi Hamdani; Hermann Ney; Adrià Giménez; Jesús Andrés-Ferrer; Alfons Juan

In this paper a vertical repositioning method based on the center of gravity is investigated for handwriting recognition systems and evaluated on databases containing Arabic and French handwriting. Experiments show that vertical distortion in images has a large impact on the performance of HMM based handwriting recognition systems. Recently good results were obtained with Bernoulli HMMs (BHMMs) using a preprocessing with vertical repositioning of binarized images. In order to isolate the effect of the preprocessing from the BHMM model, experiments were conducted with Gaussian HMMs and the LSTM-RNN tandem HMM approach with relative improvements of 33% WER on the Arabic and up to 62% on the French database.

Archive | 2012

Arabic Handwriting Recognition Using Bernoulli HMMs

Ihab Alkhoury; Adrià Giménez; Alfons Juan

Hidden Markov models (HMMs) are now widely used for off-line handwriting recognition in many languages and, in particular, in Arabic. As in speech recognition, they are usually built from shared, embedded HMMs at the symbol level, in which state-conditional probability density functions are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kinds of features should be used and, indeed, very different feature sets are in use today. Among them, we have recently proposed to simply use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to bypass feature extraction and ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this chapter, we review this idea along with some extensions that are currently providing state-of-the-art results on Arabic handwritten word recognition.

international conference on document analysis and recognition | 2011

Discriminative Bernoulli Mixture Models for Handwritten Digit Recognition

Adrià Giménez; Jesús Andrés-Ferrer; Alfons Juan; Nicolás Serrano

Bernoulli-based models such as Bernoulli mixtures or Bernoulli HMMs (BHMMs), have been successfully applied to several handwritten text recognition (HTR) tasks which range from character recognition to continuous and isolated handwritten words. All these models belong to the generative model family and, hence, are usually trained by (joint) maximum likelihood estimation (MLE). Despite the good properties of the MLE criterion, there are better training criteria such as maximum mutual information (MMI). The MMI is a widespread criterion that is mainly employed to train discriminative models such as log-linear (or maximum entropy) models. Inspired by the Bernoulli mixture classifier, in this work a log-linear model for binary data is proposed, the so-called mixture of multi-class logistic regression. The proposed model is proved to be equivalent to the Bernoulli mixture classifier. In this way, we give a discriminative training framework for Bernoulli mixture models. The proposed discriminative training framework is applied to a well-known Indian digit recognition task.

Explore More