Sebastián Peña Saldarriaga

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sebastián Peña Saldarriaga is active.

Explore More

Publication

Featured researches published by Sebastián Peña Saldarriaga.

international conference on document analysis and recognition | 2011

HAMEX - A Handwritten and Audio Dataset of Mathematical Expressions

Solen Quiniou; Harold Mouchère; Sebastián Peña Saldarriaga; Christian Viard-Gaudin; Emmanuel Morin; Simon Petitrenaud; Sofiane Medjkoune

In this paper, we present HAMEX, a new public dataset that contains mathematical expressions available in their on-line handwritten form and in their audio spoken form. We have designed this dataset so that, given a mathematical expression, its handwritten signal and its audio signal can be used jointly to design multimodal recognition systems. Here, we describe the different steps that allowed us to acquire this dataset, from the creation of the mathematical expression corpora (including expressions from Wikipedia pages) to the segmentation and the transcription of the collected data, via the data collection process itself. Currently, the dataset contains 4 350 on-line handwritten mathematical expressions written by 58 writers, and the corresponding audio expressions (in French) spoken by 58 speakers. The ground truth is also provided both for the handwritten expressions (as INKML files with the digital ink, the symbol segmentation, and the MATHML structure) and for the audio expressions (as XML files with the transcriptions of the spoken expressions).

document analysis systems | 2008

Categorization of On-Line Handwritten Documents

Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin

With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.

document recognition and retrieval | 2012

Retrieving handwriting by combining word spotting and manifold ranking

Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin

Online handwritten data, produced with Tablet PCs or digital pens, consists in a sequence of points (x, y). As the amount of data available in this form increases, algorithms for retrieval of online data are needed. Word spotting is a common approach used for the retrieval of handwriting. However, from an information retrieval (IR) perspective, word spotting is a primitive keyword based matching and retrieval strategy. We propose a framework for handwriting retrieval where an arbitrary word spotting method is used, and then a manifold ranking algorithm is applied on the initial retrieval scores. Experimental results on a database of more than 2,000 handwritten newswires show that our method can improve the performances of a state-of-the-art word spotting system by more than 10%.

international conference on document analysis and recognition | 2009

Using top n Recognition Candidates to Categorize On-line Handwritten Documents

Sebastián Peña Saldarriaga; Emmanuel Morin; Christian Viard-Gaudin

The traditional weighting schemes used in text categorization for the vector space model (VSM) cannot exploit information intrinsic to texts obtained through on-line handwriting recognition or any OCR process. Especially, top n (n ≫ 1) recognition candidates could not be used without flooding the resulting text with false occurrences of spurious terms. In this paper, an improved weighting scheme for text categorization, that estimates the occurrences of terms from the posterior probabilities of the top n candidates, is proposed. The experimental results show that the categorization performances increase for texts with high error rates.

meeting of the association for computational linguistics | 2011