Jose A. Rodriguez-Serrano

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jose A. Rodriguez-Serrano is active.

Explore More

Publication

Featured researches published by Jose A. Rodriguez-Serrano.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

A Model-Based Sequence Similarity with Application to Handwritten Word Spotting

Jose A. Rodriguez-Serrano; Florent Perronnin

This paper proposes a novel similarity measure between vector sequences. We work in the framework of model-based approaches, where each sequence is first mapped to a Hidden Markov Model (HMM) and then a measure of similarity is computed between the HMMs. We propose to model sequences with semicontinuous HMMs (SC-HMMs). This is a particular type of HMM whose emission probabilities in each state are mixtures of shared Gaussians. This crucial constraint provides two major benefits. First, the a priori information contained in the common set of Gaussians leads to a more accurate estimate of the HMM parameters. Second, the computation of a similarity between two SC-HMMs can be simplified to a Dynamic Time Warping (DTW) between their mixture weight vectors, which significantly reduces the computational cost. Experiments are carried out on a handwritten word retrieval task in three different datasets-an in-house dataset of real handwritten letters, the George Washington dataset, and the IFN/ENIT dataset of Arabic handwritten words. These experiments show that the proposed similarity outperforms the traditional DTW between the original sequences, and the model-based approach which uses ordinary continuous HMMs. We also show that this increase in accuracy can be traded against a significant reduction of the computational cost.

british machine vision conference | 2013

LABEL-EMBEDDING FOR TEXT RECOGNITION

Jose A. Rodriguez-Serrano; Florent Perronnin

The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields (CRF). This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM (SSVM) framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents the following advantages: it does not require costly preor post-processing operations, it allows for the recognition of never-seen-before words and the recognition process is efficient. Experiments are performed on two challenging datasets (one of license plates and one of scene text) and show that the proposed method is competitive with standard bottom-up approaches to text recognition.

international conference on document analysis and recognition | 2009

Fisher Kernels for Handwritten Word-spotting

Florent Perronnin; Jose A. Rodriguez-Serrano

The Fisher kernel is a generic framework which combines the benefits of generative and discriminative approaches to pattern classification. In this contribution, we propose to apply this framework to handwritten word-spotting. Given a word image and a keyword generative model, the idea is to generate a vector which describes how the parameters of the keyword model should be modified to best fit the word image.This vector can then be used as the input of a discriminative classifier. We compare the performance of the proposed approach with that of a generative baseline on a challenging real-world dataset of customer letters. When the kernel used by the classifier is linear, the performance improvement is marginal but the proposed system is approximately 15 times faster than the baseline. If we use a non-linear kernel devised for this task, we obtain a 15\% relative reduction of the error but the detector is approximately 15 times slower.

International Journal of Computer Vision | 2015

Label Embedding: A Frugal Baseline for Text Recognition

Jose A. Rodriguez-Serrano; Albert Gordo; Florent Perronnin

The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields. This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents several advantages: it does not require ad-hoc or costly pre-/post-processing operations, it can build on top of any state-of-the-art image descriptor (Fisher vectors in our case), it allows for the recognition of never-seen-before words (zero-shot recognition) and the recognition process is simple and efficient, as it amounts to a nearest neighbor search. Experiments are performed on challenging datasets of license plates and scene text. The main conclusion of the paper is that with such a frugal approach it is possible to obtain results which are competitive with standard bottom-up approaches, thus establishing label embedding as an interesting and simple to compute baseline for text recognition.

Pattern Recognition Letters | 2013

Robust abandoned object detection integrating wide area visual surveillance and social context

James M. Ferryman; David C. Hogg; Jan Sochman; Ardhendu Behera; Jose A. Rodriguez-Serrano; Simon F. Worgan; Longzhen Li; Valerie Leung; Murray Evans; Philippe Cornic; Stéphane Herbin; Stefan Schlenger; Michael Dose

This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).

Pattern Recognition | 2012

Synthesizing queries for handwritten word image retrieval

Jose A. Rodriguez-Serrano; Florent Perronnin

We propose a method to perform text searches on handwritten word image databases when no ground-truth data is available to learn models or select example queries. The approach proceeds by synthesizing multiple images of the query string using different computer fonts. While this idea has been successfully applied to printed documents in the past, its application to the handwritten domain is not straightforward. Indeed, the domain mismatch between queries (synthetic) and database images (handwritten) leads to poor accuracy. Our solution is to represent the queries with robust features and use a model that explicitly accounts for the domain mismatch. While the model is trained using synthetic images, its generative process produces samples according to the distribution of handwritten features. Furthermore, we propose an unsupervised method to perform font selection which has a significant impact on accuracy. Font selection is formulated as finding an optimal weighted mixture of fonts that best approximates the distribution of handwritten low-level features. Experiments demonstrate that the proposed method is an effective way to perform queries without using any human annotated example in any part of the process.

Proceedings of SPIE | 2012

Image simulation for automatic license plate recognition

Raja Bala; Yonghui Zhao; Aaron Michael Burry; Vladimir Kozitsky; Claude S. Fillion; Craig Saunders; Jose A. Rodriguez-Serrano

Automatic license plate recognition (ALPR) is an important capability for traffic surveillance applications, including toll monitoring and detection of different types of traffic violations. ALPR is a multi-stage process comprising plate localization, character segmentation, optical character recognition (OCR), and identification of originating jurisdiction (i.e. state or province). Training of an ALPR system for a new jurisdiction typically involves gathering vast amounts of license plate images and associated ground truth data, followed by iterative tuning and optimization of the ALPR algorithms. The substantial time and effort required to train and optimize the ALPR system can result in excessive operational cost and overhead. In this paper we propose a framework to create an artificial set of license plate images for accelerated training and optimization of ALPR algorithms. The framework comprises two steps: the synthesis of license plate images according to the design and layout for a jurisdiction of interest; and the modeling of imaging transformations and distortions typically encountered in the image capture process. Distortion parameters are estimated by measurements of real plate images. The simulation methodology is successfully demonstrated for training of OCR.

international conference on intelligent transportation systems | 2013

Vehicle type classification from laser scanner profiles: A benchmark of feature descriptors

Harsimrat Sandhawalia; Jose A. Rodriguez-Serrano; Herve Poirier; Gabriela Csurka

This article targets the problem of vehicle classification using laser scanner profiles, which is usually found as a component of electronic tolling systems. Laser scanners obtain a 3D measurement of the vehicle surface. Previous approaches have extracted high-level features (such as width, height, length and other measurements) from the scanner profiles, or have taken the raw profiles for further pattern analysis. In this article, we focus on feature descriptors for supervised classification of laser scanner profiles. We evaluate a number of feature descriptors, including high-level features and raw profiles, but also introduce new descriptors. A 3D profile when interpreted as a 2D image with depth values as pixel intensities can benefit from recent advances in computer vision. Experiments on a real-world vehicle classification task indicate that the image-based descriptors, especially the Fisher vector, obtain improved performances with respect to high-level features and raw profiles.

international conference on computer vision | 2012

Data-driven vehicle identification by image matching

Jose A. Rodriguez-Serrano; Harsimrat Sandhawalia; Raja Bala; Florent Perronnin; Craig Saunders

Vehicle identification from images has been predominantly addressed through automatic license plate recognition (ALPR) techniques which detect and recognize the characters in the plate region of the image. We move away from traditional ALPR techniques and advocate for a data-driven approach for vehicle identification. Here, given a plate image region, the idea is to search for a near-duplicate image in an annotated database; if found, the identity of the near-duplicate is transferred to the input region. Although this approach could be perceived as impractical, we actually demonstrate that it is feasible with state-of-the-art image representations, and that it presents some advantages in terms of speed, and time-to-deploy. To overcome the issue of identifying previously unseen identities, we propose an image simulation approach where photo-realistic images of license plates are generated for desired plate numbers. We demonstrate that there is no perceivable performance difference between using synthetic and real plates. We also improve the matching accuracy using similarity learning, which is in the spirit of domain adaptation.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Data-Driven Detection of Prominent Objects

Jose A. Rodriguez-Serrano; Diane Larlus; Zhenwen Dai

This article deals with the detection of prominent objects in images. As opposed to the standard approaches based on sliding windows, we study a fundamentally different solution by formulating the supervised prediction of a bounding box as an image retrieval task. Indeed, given a global image descriptor, we find the most similar images in an annotated dataset, and transfer the object bounding boxes. We refer to this approach as data-driven detection (DDD). Our key novelty is to design or learn image similarities that explicitly optimize some aspect of the transfer unlike previous work which uses generic representations and unsupervised similarities. In a first variant, we explicitly learn to transfer, by adapting a metric learning approach to work with image and bounding box pairs. Second, we use a representation of images as object probability maps computed from low-level patch classifiers. Experiments show that these two contributions yield in some cases comparable or better results than standard sliding window detectors - despite its conceptual simplicity and run-time efficiency. Our third contribution is an application of prominent object detection, where we improve fine-grained categorization by pre-cropping images with the proposed approach. Finally, we also extend the proposed approach to detect multiple parts of rigid objects.

Explore More