Adrià Giménez
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Adrià Giménez.
international conference on frontiers in handwriting recognition | 2010
Adrià Giménez; Ihab Khoury; Alfons Juan
Hidden Markov Models (HMMs) are now widely used in off-line handwriting recognition and, in particular, in Arabic handwritten word recognition. In contrast to the conventional approach, based on Gaussian mixture HMMs, we have recently proposed to directly fed columns of raw, binary pixels into Bernoulli mixture HMMs. In this work, column bit vectors are extended by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image. Using these windowed Bernoulli mixture HMMs, very good results are reported on the well-known IfN/ENIT database of Arabic handwritten Tunisian town names.
international conference on document analysis and recognition | 2009
Adrià Giménez; Alfons Juan
Hidden Markov Models (HMMs) are now widely used in off-line handwritten word recognition. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, in which state-conditional probability density functions are modelled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of real-valued features should be used and, indeed, very different features sets are in use today. In this paper, we propose to by-pass feature extraction and directly fed columns of raw, binary image pixels into embedded Bernoulli mixture HMMs, that is, embedded HMMs in which the emission probabilities are modelled with Bernoulli mixtures. The idea is to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. Empirical results are reported in which similar results are obtained with both Bernoulli and Gaussian mixtures, though Bernoulli mixtures are much simpler.
Pattern Recognition Letters | 2014
Adrià Giménez; Ihab Khoury; Jesús Andrés-Ferrer; Alfons Juan
Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, where state-conditional probability density functions in each HMM are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of features should be used and, indeed, very different features sets are in use today. Among them, we have recently proposed to directly use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to by-pass feature extraction and to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this work, column bit vectors are extended by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image. Using these windowed Bernoulli mixture HMMs, good results are reported on the well-known IAM and RIMES databases of Latin script, and in particular, state-of-the-art results are provided on the IfN/ENIT database of Arabic handwritten words.
International Journal on Document Analysis and Recognition | 2014
Nicolás Serrano; Adrià Giménez; Jorge Civera; Alberto Sanchis; Alfons Juan
Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. Although post-editing automatic recognition of handwritten text is feasible, it is not clearly better than simply ignoring it and transcribing the document from scratch. A more effective approach is to follow an interactive approach in which both the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Nevertheless, in some applications, the user effort available to transcribe documents is limited and fully supervision of the system output is not realistic. To circumvent these problems, we propose a novel interactive approach which efficiently employs user effort to transcribe a document by improving three different aspects. Firstly, the system employs a limited amount of effort to solely supervise recognised words that are likely to be incorrect. Thus, user effort is efficiently focused on the supervision of words for which the system is not confident enough. Secondly, it refines the initial transcription provided to the user by recomputing it constrained to user supervisions. In this way, incorrect words in unsupervised parts can be automatically amended without user supervision. Finally, it improves the underlying system models by retraining the system from partially supervised transcriptions. In order to prove these statements, empirical results are presented on two real databases showing that the proposed approach can notably reduce user effort in the transcription of handwritten text in (old) documents.
international conference on frontiers in handwriting recognition | 2012
Patrick Doetsch; Mahdi Hamdani; Hermann Ney; Adrià Giménez; Jesús Andrés-Ferrer; Alfons Juan
In this paper a vertical repositioning method based on the center of gravity is investigated for handwriting recognition systems and evaluated on databases containing Arabic and French handwriting. Experiments show that vertical distortion in images has a large impact on the performance of HMM based handwriting recognition systems. Recently good results were obtained with Bernoulli HMMs (BHMMs) using a preprocessing with vertical repositioning of binarized images. In order to isolate the effect of the preprocessing from the BHMM model, experiments were conducted with Gaussian HMMs and the LSTM-RNN tandem HMM approach with relative improvements of 33% WER on the Arabic and up to 62% on the French database.
iberian conference on pattern recognition and image analysis | 2007
Verónica Romero; Adrià Giménez; Alfons Juan
Bernoulli mixture models have been recently proposed as simple yet powerful probabilistic models for binary images in which each image pattern is modelled by a different Bernoulli prototype (component). A possible limitation of these models, however, is that usual geometric transformations of image patterns are not explicitly modelled and, therefore, each natural transformation of an image pattern has to be independentlymodelled using a different, rigidprototype. In this work, we propose a simple technique to make these rigid prototypes more flexible by explicit modelling of invariances to translation, scaling and rotation. Results are reported on a task of handwritten Indian digits recognition.
Archive | 2012
Ihab Alkhoury; Adrià Giménez; Alfons Juan
Hidden Markov models (HMMs) are now widely used for off-line handwriting recognition in many languages and, in particular, in Arabic. As in speech recognition, they are usually built from shared, embedded HMMs at the symbol level, in which state-conditional probability density functions are modeled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kinds of features should be used and, indeed, very different feature sets are in use today. Among them, we have recently proposed to simply use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea is to bypass feature extraction and ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. In this chapter, we review this idea along with some extensions that are currently providing state-of-the-art results on Arabic handwritten word recognition.
international conference on document analysis and recognition | 2011
Adrià Giménez; Jesús Andrés-Ferrer; Alfons Juan; Nicolás Serrano
Bernoulli-based models such as Bernoulli mixtures or Bernoulli HMMs (BHMMs), have been successfully applied to several handwritten text recognition (HTR) tasks which range from character recognition to continuous and isolated handwritten words. All these models belong to the generative model family and, hence, are usually trained by (joint) maximum likelihood estimation (MLE). Despite the good properties of the MLE criterion, there are better training criteria such as maximum mutual information (MMI). The MMI is a widespread criterion that is mainly employed to train discriminative models such as log-linear (or maximum entropy) models. Inspired by the Bernoulli mixture classifier, in this work a log-linear model for binary data is proposed, the so-called mixture of multi-class logistic regression. The proposed model is proved to be equivalent to the Bernoulli mixture classifier. In this way, we give a discriminative training framework for Bernoulli mixture models. The proposed discriminative training framework is applied to a well-known Indian digit recognition task.
international conference on multimodal interfaces | 2010
Nicolás Serrano; Adrià Giménez; Alberto Sanchis; Alfons Juan
Active learning strategies are being increasingly used in a variety of real-world tasks, though their application to handwritten text transcription in old manuscripts remains nearly unexplored. The basic idea is to follow a sequential, line-byline transcription of the whole manuscript in which a continuously retrained system interacts with the user to efficiently transcribe each new line. This approach has been recently explored using a conventional strategy by which the user is only asked to supervise words that are not recognized with high confidence. In this paper, the conventional strategy is improved by also letting the system to recompute most probable hypotheses with the constraints imposed by user supervisions. In particular, two strategies are studied which differ in the frequency of hypothesis recomputation on the current line: after each (iterative) or all (delayed) user corrections. Empirical results are reported on two real tasks showing that these strategies outperform the conventional approach.
IberSPEECH 2014 Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854 | 2014
M. A. del-Agua; Adrià Giménez; Nicolás Serrano; Jesús Andrés-Ferrer; Jorge Civera; Alberto Sanchis; Alfons Juan
Over the past few years, online multimedia educational repositories have increased in number and popularity. The main aim of the transLectures project is to develop cost-effective solutions for producing accurate transcriptions and translations for large video lecture repositories, such as VideoLectures.NET or the Universitat Politecnica de Valencias repository, poliMedia. In this paper, we present the transLectures-UPV toolkit TLK, which has been specifically designed to meet the requirements of the transLectures project, but can also be used as a conventional ASR toolkit. The main features of the current release include HMM training and decoding with speaker adaptation techniques fCMLLR. TLK has been tested on the VideoLectures.NET and poliMedia repositories, yielding very competitive results. TLK has been released under the permissive open source Apache License v2.0 and can be directly downloaded from the transLectures website.