Sebastián Peña Saldarriaga
École Normale Supérieure
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sebastián Peña Saldarriaga.
international conference on document analysis and recognition | 2011
Solen Quiniou; Harold Mouchère; Sebastián Peña Saldarriaga; Christian Viard-Gaudin; Emmanuel Morin; Simon Petitrenaud; Sofiane Medjkoune
In this paper, we present HAMEX, a new public dataset that contains mathematical expressions available in their on-line handwritten form and in their audio spoken form. We have designed this dataset so that, given a mathematical expression, its handwritten signal and its audio signal can be used jointly to design multimodal recognition systems. Here, we describe the different steps that allowed us to acquire this dataset, from the creation of the mathematical expression corpora (including expressions from Wikipedia pages) to the segmentation and the transcription of the collected data, via the data collection process itself. Currently, the dataset contains 4 350 on-line handwritten mathematical expressions written by 58 writers, and the corresponding audio expressions (in French) spoken by 58 speakers. The ground truth is also provided both for the handwritten expressions (as INKML files with the digital ink, the symbol segmentation, and the MATHML structure) and for the audio expressions (as XML files with the transcriptions of the spoken expressions).
document analysis systems | 2008
Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin
With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.
document recognition and retrieval | 2012
Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin
Online handwritten data, produced with Tablet PCs or digital pens, consists in a sequence of points (x, y). As the amount of data available in this form increases, algorithms for retrieval of online data are needed. Word spotting is a common approach used for the retrieval of handwriting. However, from an information retrieval (IR) perspective, word spotting is a primitive keyword based matching and retrieval strategy. We propose a framework for handwriting retrieval where an arbitrary word spotting method is used, and then a manifold ranking algorithm is applied on the initial retrieval scores. Experimental results on a database of more than 2,000 handwritten newswires show that our method can improve the performances of a state-of-the-art word spotting system by more than 10%.
international conference on document analysis and recognition | 2009
Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin
The traditional weighting schemes used in text categorization for the vector space model (VSM) cannot exploit information intrinsic to texts obtained through on-line handwriting recognition or any OCR process. Especially, top n (n ≫ 1) recognition candidates could not be used without flooding the resulting text with false occurrences of spurious terms. In this paper, an improved weighting scheme for text categorization, that estimates the occurrences of terms from the posterior probabilities of the top n candidates, is proposed. The experimental results show that the categorization performances increase for texts with high error rates.
meeting of the association for computational linguistics | 2011
Amir Hazem; Emmanuel Morin; Sebastián Peña Saldarriaga
Conférence TALN | 2016
Joseph Lark; Emmanuel Morin; Sebastián Peña Saldarriaga
TALN 2015 | 2015
Joseph Lark; Emmanuel Morin; Sebastián Peña Saldarriaga
International Conference on Digital Intelligence 2014 | 2014
Joseph Lark; Sebastián Peña Saldarriaga; Emmanuel Morin; Fabien Poulard; Sylvain Ornetti
18e Conférence sur le Traitement Automatique des Langues Naturelles (TALN) | 2011
Amir Hazem; Emmanuel Morin; Sebastián Peña Saldarriaga
Colloque International Francophone sur l'Écrit et le Document (CIFED 2010) | 2010
Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin
Collaboration
Dive into the Sebastián Peña Saldarriaga's collaboration.
Institut de Recherche en Communications et Cybernétique de Nantes
View shared research outputsInstitut de Recherche en Communications et Cybernétique de Nantes
View shared research outputs