Joan-Andreu Sánchez
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joan-Andreu Sánchez.
Pattern Recognition Letters | 2014
Francisco Álvaro; Joan-Andreu Sánchez; José-Miguel Benedí
This paper describes a formal model for the recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Hidden Markov models are used to recognize mathematical symbols, and a stochastic context-free grammar is used to model the relation between these symbols. This formal model makes possible to use classic algorithms for parsing and stochastic estimation. In this way, first, the model is able to capture many of variability phenomena that appear in on-line handwritten mathematical expressions during the training process. And second, the parsing process can make decisions taking into account only stochastic information, and avoiding heuristic decisions. The proposed model participated in a contest of mathematical expression recognition and it obtained the best results at different levels.
Pattern Recognition | 2013
Verónica Romero; Alicia Fornés; Nicolás Serrano; Joan-Andreu Sánchez; Alejandro Héctor Toselli; Volkmar Frinken; Enrique Vidal; Josep Lladós
Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demography studies and genealogical research. Automatic processing of historical documents, however, has mostly been focused on single works of literature and less on social records, which tend to have a distinct layout, structure, and vocabulary. Such information is usually collected by expert demographers that devote a lot of time to manually transcribe them. This paper presents a new database, compiled from a marriage license books collection, to support research in automatic handwriting recognition for historical documents containing social records. Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. Books from this collection are handwritten and span nearly half a millennium until the beginning of the 20th century. In addition, a study is presented about the capability of state-of-the-art handwritten text recognition systems, when applied to the presented database. Baseline results are reported for reference in future studies.
Computer Speech & Language | 2005
José-Miguel Benedí; Joan-Andreu Sánchez
This paper is devoted to the estimation of stochastic context-free grammars (SCFGs) and their use as language models. Classical estimation algorithms, together with new ones that consider a certain subset of derivations in the estimation process, are presented in a unified framework. This set of derivations is chosen according to both structural and statistical criteria. The estimated SCFGs have been used in a new hybrid language model to combine both a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG together with a word distribution into categories, which is defined to represent the long-term relations between these categories. We describe methods for learning these stochastic models for complex tasks, and we present an algorithm for computing the word transition probability using this hybrid language model. Finally, experiments on the UPenn Treebank corpus show significant improvements in the test set perplexity with regard to the classical word trigram models.
document engineering | 2013
Joan-Andreu Sánchez; Günter Mühlberger; Basilis Gatos; Philip Schofield; Katrien Depuydt; Richard M. Davis; Enrique Vidal; Jesse de Does
The tranScriptorium project aims to develop innovative, efficient and cost-effective solutions for annotating handwritten historical documents using modern, holistic Handwritten Text Recognition (HTR) technology. Three actions are planned in tranScriptorium: i) improve basic image preprocessing and holistic HTR techniques; ii) develop novel indexing and keyword searching approaches; and iii) capitalize on new, user-friendly interactive-predictive HTR approaches for computer-assisted operation.
international conference on pattern recognition | 2010
Francisco Álvaro; Joan-Andreu Sánchez
Automatic recognition of printed mathematical symbols is a fundamental problem for recognition of mathematical expressions. Several classification techniques has been previously used, but there are very few works that compare different classification techniques on the same database and with the same experimental conditions. In this work we have tested classical and novelty classification techniques for mathematical symbol recognition on two databases.
workshop on statistical machine translation | 2006
Joan-Andreu Sánchez; José-Miguel Benedí
An important problem that is related to phrase-based statistical translation models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, we propose obtaining word phrases by means of a Stochastic Inversion Translation Grammar. Experiments on the shared task proposed in this workshop with the Europarl corpus have been carried out and good results have been obtained.
international conference on pattern recognition | 2014
Francisco Álvaro; Joan-Andreu Sánchez; José-Miguel Benedí
In mathematical expression recognition, symbol classification is a crucial step. Numerous approaches for recognizing handwritten math symbols have been published, but most of them are either an online approach or a hybrid approach. There is an absence of a study focused on offline features for handwritten math symbol recognition. Furthermore, many papers provide results difficult to compare. In this paper we assess the performance of several well-known offline features for this task. We also test a novel set of features based on polar histograms and the vertical repositioning method for feature extraction. Finally, we report and analyze the results of several experiments using recurrent neural networks on a large public database of online handwritten math expressions. The combination of online and offline features significantly improved the recognition rate.
ACM Transactions on Asian Language Information Processing | 2004
Diego Linares; José-Miguel Benedí; Joan-Andreu Sánchez
In this paper, a hybrid language model is defined as a combination of a word-based <i>n</i>-gram, which is used to capture the local relations between words, and a category-based stochastic context-free grammar (SCFG) with a word distribution into categories, which is defined to represent the long-term relations between these categories. The problem of unsupervised learning of a SCFG in General Format and in Chomsky Normal Form by means of estimation algorithms is studied. Moreover, a bracketed version of the classical estimation algorithm based on the Earley algorithm is proposed. This paper also explores the use of SCFGs obtained from a treebank corpus as initial models for the estimation algorithms. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.
international conference on computational linguistics | 2000
José-Miguel Benedí; Joan-Andreu Sánchez
This paper describes a hybrid proposal to combine n-grams and Stochastic Context-Free Grammars (SCFGs) for language modeling. A classical n-gram model is used to capture the local relations between words, while a stochastic grammatical model is considered to represent the long-term relations between syntactical structures. In order to define this grammatical model, which will be used on large-vocabulary complex tasks, a category-based SCFG and a probabilistic model of word distribution in the categories have been proposed. Methods for learning these stochastic models for complex tasks are described, and algorithms for computing the word transition probabilities are also presented. Finally, experiments using the Penn Treebank corpus improved by 30% the test set perplexity with regard to the classical n-gram models.
international colloquium on grammatical inference | 2000
Francisco Nevado; Joan-Andreu Sánchez; José-Miguel Benedí
Some of the most widely-known methods to obtain Stochastic Context-Free Grammars (SCFGs) are based on estimation algorithms. All of these algorithms maximize a certain criterion function from a training sample by using gradient descendent techniques. In this optimization process, the obtaining of the initial SCFGs is an important factor, given that it affects the convergence process and the maximum which can be achieved. Here, we show experimentally how the results can be improved in cases when structural information about the task is inductively incorporated into the initial SCFGs. In this work, we present a stochastic version of the well-known Sakakibara algorithm in order to learn these initial SCFGs. Finally, an experimental study on part of the Wall Street Journal corpus was carried out.