Joan-Andreu Sánchez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joan-Andreu Sánchez is active.

Explore More

Publication

Featured researches published by Joan-Andreu Sánchez.

Pattern Recognition Letters | 2014

Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models

Francisco Álvaro; Joan-Andreu Sánchez; José-Miguel Benedí

This paper describes a formal model for the recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Hidden Markov models are used to recognize mathematical symbols, and a stochastic context-free grammar is used to model the relation between these symbols. This formal model makes possible to use classic algorithms for parsing and stochastic estimation. In this way, first, the model is able to capture many of variability phenomena that appear in on-line handwritten mathematical expressions during the training process. And second, the parsing process can make decisions taking into account only stochastic information, and avoiding heuristic decisions. The proposed model participated in a contest of mathematical expression recognition and it obtained the best results at different levels.

Pattern Recognition | 2013

The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition

Verónica Romero; Alicia Fornés; Nicolás Serrano; Joan-Andreu Sánchez; Alejandro Héctor Toselli; Volkmar Frinken; Enrique Vidal; Josep Lladós

Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demography studies and genealogical research. Automatic processing of historical documents, however, has mostly been focused on single works of literature and less on social records, which tend to have a distinct layout, structure, and vocabulary. Such information is usually collected by expert demographers that devote a lot of time to manually transcribe them. This paper presents a new database, compiled from a marriage license books collection, to support research in automatic handwriting recognition for historical documents containing social records. Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. Books from this collection are handwritten and span nearly half a millennium until the beginning of the 20th century. In addition, a study is presented about the capability of state-of-the-art handwritten text recognition systems, when applied to the presented database. Baseline results are reported for reference in future studies.

Computer Speech & Language | 2005

Estimation of stochastic context-free grammars and their use as language models

José-Miguel Benedí; Joan-Andreu Sánchez

This paper is devoted to the estimation of stochastic context-free grammars (SCFGs) and their use as language models. Classical estimation algorithms, together with new ones that consider a certain subset of derivations in the estimation process, are presented in a unified framework. This set of derivations is chosen according to both structural and statistical criteria. The estimated SCFGs have been used in a new hybrid language model to combine both a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG together with a word distribution into categories, which is defined to represent the long-term relations between these categories. We describe methods for learning these stochastic models for complex tasks, and we present an algorithm for computing the word transition probability using this hybrid language model. Finally, experiments on the UPenn Treebank corpus show significant improvements in the test set perplexity with regard to the classical word trigram models.

document engineering | 2013

tranScriptorium: a european project on handwritten text recognition

Joan-Andreu Sánchez; Günter Mühlberger; Basilis Gatos; Philip Schofield; Katrien Depuydt; Richard M. Davis; Enrique Vidal; Jesse de Does

The tranScriptorium project aims to develop innovative, efficient and cost-effective solutions for annotating handwritten historical documents using modern, holistic Handwritten Text Recognition (HTR) technology. Three actions are planned in tranScriptorium: i) improve basic image preprocessing and holistic HTR techniques; ii) develop novel indexing and keyword searching approaches; and iii) capitalize on new, user-friendly interactive-predictive HTR approaches for computer-assisted operation.

international conference on pattern recognition | 2010

Comparing Several Techniques for Offline Recognition of Printed Mathematical Symbols

Francisco Álvaro; Joan-Andreu Sánchez

Automatic recognition of printed mathematical symbols is a fundamental problem for recognition of mathematical expressions. Several classification techniques has been previously used, but there are very few works that compare different classification techniques on the same database and with the same experimental conditions. In this work we have tested classical and novelty classification techniques for mathematical symbol recognition on two databases.

workshop on statistical machine translation | 2006

Stochastic Inversion Transduction Grammars for Obtaining Word Phrases for Phrase-based Statistical Machine Translation

Joan-Andreu Sánchez; José-Miguel Benedí

An important problem that is related to phrase-based statistical translation models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, we propose obtaining word phrases by means of a Stochastic Inversion Translation Grammar. Experiments on the shared task proposed in this workshop with the Europarl corpus have been carried out and good results have been obtained.

international conference on pattern recognition | 2014

Offline Features for Classifying Handwritten Math Symbols with Recurrent Neural Networks

Francisco Álvaro; Joan-Andreu Sánchez; José-Miguel Benedí

In mathematical expression recognition, symbol classification is a crucial step. Numerous approaches for recognizing handwritten math symbols have been published, but most of them are either an online approach or a hybrid approach. There is an absence of a study focused on offline features for handwritten math symbol recognition. Furthermore, many papers provide results difficult to compare. In this paper we assess the performance of several well-known offline features for this task. We also test a novel set of features based on polar histograms and the vertical repositioning method for feature extraction. Finally, we report and analyze the results of several experiments using recurrent neural networks on a large public database of online handwritten math expressions. The combination of online and offline features significantly improved the recognition rate.

ACM Transactions on Asian Language Information Processing | 2004

A hybrid language model based on a combination of N -grams and stochastic context-free grammars

Diego Linares; José-Miguel Benedí; Joan-Andreu Sánchez

In this paper, a hybrid language model is defined as a combination of a word-based <i>n</i>-gram, which is used to capture the local relations between words, and a category-based stochastic context-free grammar (SCFG) with a word distribution into categories, which is defined to represent the long-term relations between these categories. The problem of unsupervised learning of a SCFG in General Format and in Chomsky Normal Form by means of estimation algorithms is studied. Moreover, a bracketed version of the classical estimation algorithm based on the Earley algorithm is proposed. This paper also explores the use of SCFGs obtained from a treebank corpus as initial models for the estimation algorithms. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.

international conference on computational linguistics | 2000

Combination of n-grams and Stochastic Context-Free Grammars for language modeling

José-Miguel Benedí; Joan-Andreu Sánchez

This paper describes a hybrid proposal to combine n-grams and Stochastic Context-Free Grammars (SCFGs) for language modeling. A classical n-gram model is used to capture the local relations between words, while a stochastic grammatical model is considered to represent the long-term relations between syntactical structures. In order to define this grammatical model, which will be used on large-vocabulary complex tasks, a category-based SCFG and a probabilistic model of word distribution in the categories have been proposed. Methods for learning these stochastic models for complex tasks are described, and algorithms for computing the word transition probabilities are also presented. Finally, experiments using the Penn Treebank corpus improved by 30% the test set perplexity with regard to the classical n-gram models.

international colloquium on grammatical inference | 2000

Combination of Estimation Algorithms and Grammatical Inference Techniques to Learn Stochastic Context-Free Grammars

Francisco Nevado; Joan-Andreu Sánchez; José-Miguel Benedí

Some of the most widely-known methods to obtain Stochastic Context-Free Grammars (SCFGs) are based on estimation algorithms. All of these algorithms maximize a certain criterion function from a training sample by using gradient descendent techniques. In this optimization process, the obtaining of the initial SCFGs is an important factor, given that it affects the convergence process and the maximum which can be achieved. Here, we show experimentally how the results can be improved in cases when structural information about the task is inductively incorporated into the initial SCFGs. In this work, we present a stochastic version of the well-known Sakakibara algorithm in order to learn these initial SCFGs. Finally, an experimental study on part of the Wall Street Journal corpus was carried out.

Explore More