Joan Puigcerver
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joan Puigcerver.
international conference on document analysis and recognition | 2015
Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
The principal goal of the Competition on Keyword Spotting for Handwritten Documents was to promote different approaches used in the field of Keyword Spotting and to fairly compare them using uniform data and metrics. To accommodate different perspectives adopted by researches in this field, the competition was divided into two distinct tracks, namely, a training-free and a training-based track, and each track entailed two optional assignments. Six participants submitted solutions to one or both assignments, depending on the capabilities and/or restrictions of their systems. The data used in the competition consisted of historical documents in English with different levels of complexity. This paper presents the details of the competition, including the data, evaluation metrics and results of the best participant methods.
international conference on frontiers in handwriting recognition | 2014
Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
We present a handwritten text Keyword Spotting (KWS) approach based on the combination of KWS methods using word-graphs (WGs) and character-lattices (CLs). It aims to solve the problem that WG-based models present for out of vocabulary (OOV) keywords: since there is no available information about them in the lexicon or the language model, null scores are assigned. OOV keywords may have a significant impact on the global performance of KWS systems, as we show. By using a CL approach, which does not suffer from the previous problem, to estimate the OOV scores, we take advantage of both models, using the speed and accuracy that WGs provide for in-vocabulary keywords and the flexibility of the CL approach. This combination improves significantly both average precision and mean average precision over the two methods.
international conference on frontiers in handwriting recognition | 2016
Ioannis Pratikakis; Konstantinos Zagoris; Basilis Gatos; Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
The H-KWS 2016, organized in the context of the ICFHR 2016 conference aims at setting up an evaluation framework for benchmarking handwritten keyword spotting (KWS) examining both the Query by Example (QbE) and the Query by String (QbS) approaches. Both KWS approaches were hosted into two different tracks, which in turn were split into two distinct challenges, namely, a segmentation-based and a segmentation-free to accommodate different perspectives adopted by researchers in the KWS field. In addition, the competition aims to evaluate the submitted training-based methods under different amounts of training data. Four participants submitted at least one solution to one of the challenges, according to the capabilities and/or restrictions of their systems. The data used in the competition consisted of historical German and English documents with their own characteristics and complexities. This paper presents the details of the competition, including the data, evaluation metrics and results of the best run of each participating methods.
international conference on document analysis and recognition | 2015
Alejandro Héctor Toselli; Joan Puigcerver; Enrique Vidal
The so-called filler or garbage Hidden Markov Models (HMM-Filler) are among the most widely used models for lexicon-free, query by string key word spotting (KWS) in the fields of speech recognition and (lately) handwritten text recognition. However, it has important drawbacks. First, the keyword-specific HMM Viterbi decoding process needed to obtain the confidence scores of each spotted word involves a large computational cost. Second, in its traditional conception, the model does not take into account any context information - and more recent works where simple character bi-gram context is used show that not only the computational cost becomes even larger, but also the required keyword-specific language model becomes quite intricate to build. In a previous work we introduced KWS methods based on character lattices which proved very much simpler and faster than the traditional HMM-Filler, while providing practically identical results. Here we extend our previous work by using context-aware character lattices obtained by means of Viterbi decoding with high-order character N-gram models. Experimental results show that, as compared with a direct 2-gram HMM-filler implementation, the proposed approach requires between one and two orders of magnitude less query computing time. Moreover, for the first time in the field of handwritten text KWS, Filler-based results for N-grams up to N = 6 are reported, clearly showing a great impact of context on precision-recall performance.
international conference on document analysis and recognition | 2015
Enrique Vidal; Alejandro Héctor Toselli; Joan Puigcerver
Keyword Spotting (KWS) has been traditionally considered under two distinct frameworks: Query-by-Example (QbE) and Query-by-String (QbS). In both cases the user of the system wished to find occurrences of a particular keyword in a collection of document images. The difference is that, in QbE, the keyword is given as an exemplar image while, in QbS the keyword is given as a text string. In several works, the QbS scenario has been approached using QbE techniques; but the converse has not been studied in depth yet, despite of the fact that QbS systems typically achieve higher accuracy. In the present work, we present a very effective probabilistic approach to QbE KWS, based on highly accurate QbS KWS techniques which rely on models which need to be trained from annotated data. To assess the effectiveness of this approach, we tackle the segmentation-free QbE task of the ICFHR-2014 Competition on Handwritten KWS. Our approach achieves a mean average precision (mAP) as high as 0.715, which improves by more than 70% the best mAP achieved in this competition (0.419 under the same experimental conditions).
international conference on pattern recognition | 2014
Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
Thanks to the use of lexical and syntactic information, Word Graphs (WG) have shown to provide a competitive Precision-Recall performance, along with fast lookup times, in comparison to other techniques used for Key-Word Spotting (KWS) in handwritten text images. However, a problem of WG approaches is that they assign a null score to any keyword that was not part of the training data, i.e. Out-of-Vocabulary (OOV) keywords, whereas other techniques are able to estimate a reasonable score even for these kind of keywords. We present a smoothing technique which estimates the score of an OOV keyword based on the scores of similar keywords. This makes the WG-based KWS as flexible as other techniques with the benefit of having much faster lookup times.
international conference on document analysis and recognition | 2015
Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
Traditionally, the HMM-Filler approach has been widely used in the fields of speech recognition and handwritten text recognition to tackle lexicon-free, query-by-string keyword spotting (KWS). It computes a score to determine whether a given keyword is written in a certain image region. It is conjectured, that this score is related to the confidence of the system, respect to the previous question. However, it is still not clear what this relationship is. In this paper, the HMM-Filler score is derived from a probabilistic formulation of KWS, which gives a better understanding of its behavior and limits. Additionally, the same probabilistic framework is used to present a new algorithm to compute the KWS scores, which results in better average precision (AP), for a keyword spotting task in the widely used IAM database. We show that the new algorithm can improve the HMM-filler results up to 10.4% relative (5.3% absolute) points in AP, in the considered task.
international conference on frontiers in handwriting recognition | 2016
Alejandro Héctor Toselli; Joan Puigcerver; Enrique Vidal
Two methods are presented to improve word confidence scores for Line-Level Query-by-String Lexicon-Free Keyword Spotting (KWS) in handwritten text images. The first one approaches true relevance probabilities by means of computations directly carried out on character lattices obtained from the lines images considered. The second method uses the same character lattices, but it obtains relevance scores by first computing frame-level character sequence scores which resemble the word posteriorgrams used in previous approaches for lexicon-based KWS. The first method results from a formal probabilistic derivation, which allow us to better understand and further develop the underlying ideas. The second one is less formal but, according with experiments presented in the paper, it obtains almost identical results with much lower computational cost. Moreover, in contrast with the first method, the second one allows to directly obtain accurate bounding boxes for the spotted words.
Neural Computing and Applications | 2017
Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
Lexicon-based handwritten text keyword spotting (KWS) has proven to be a faster and more accurate alternative to lexicon-free methods. Nevertheless, since lexicon-based KWS relies on a predefined vocabulary, fixed in the training phase, it does not support queries involving out-of-vocabulary (OOV) keywords. In this paper, we outline previous work aimed at solving this problem and present a new approach based on smoothing the (null) scores of OOV keywords by means of the information provided by “similar” in-vocabulary words. Good results achieved using this approach are compared with previously published alternatives on different data sets.
iberian conference on pattern recognition and image analysis | 2015
Joan Puigcerver; Alejandro Héctor Toselli; Enrique Vidal
Lexicon-based handwritten text keyword spotting (KWS) has proven to be a very fast and accurate alternative to lexicon-free methods. Nevertheless, since lexicon-based KWS methods rely on a predefined vocabulary, fixed in the training phase, they perform poorly for any query keyword that was not included in it (i.e. out-of-vocabulary keywords). This turns the KWS system useless for that particular type of queries. In this paper, we present a new way of smoothing the scores of OOV keywords, and we compare it with previously published alternatives on different data sets.