Francisco M. Castro | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francisco M. Castro is active.

Explore More

Publication

Featured researches published by Francisco M. Castro.

international work-conference on artificial and natural neural networks | 2017

Automatic learning of gait signatures for people identification

Francisco M. Castro; Manuel J. Marín-Jiménez; Nicolás Guil; Nicolás Pérez de la Blanca

This work targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this work we explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluation of the proposed CNN architecture on the challenging TUM-GAID dataset. The experimental results indicate that using spatio-temporal cuboids of optical flow as input data for CNN allows to obtain state-of-the-art results on the gait task with an image resolution eight times lower than the previously reported results (i.e. 80x60 pixels).

machine vision applications | 2016

Multimodal features fusion for gait, gender and shoes recognition

Francisco M. Castro; Manuel J. Marín-Jiménez; Nicolás Guil

The goal of this paper is to evaluate how the fusion of multimodal features (i.e., audio, RGB and depth) can help in the challenging task of people identification based on their gait (i.e., the way they walk), or gait recognition, and by extension to the tasks of gender and shoes recognition. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio or depth patterns associated with the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging ‘TUM GAID’ dataset, which contains audio and depth recordings in addition to image sequences. The experimental results show that using either early or late fusion techniques to combine feature descriptors from three kinds of modalities (i.e., RGB, depth and audio) improves the state-of-the-art results on the standard experiments defined on the dataset for the tasks of gait, gender and shoes recognition. Additional experiments on CASIA-B (where only visual modality is available) support the benefits of feature fusion as well.

Pattern Recognition Letters | 2015

On how to improve tracklet-based gait recognition systems

Manuel J. Marín-Jiménez; Francisco M. Castro; A. Carmona-Poyato; Nicolás Guil

New algorithm to discard irrelevant tracklets for gait representation.Comparison of new RootDCS descriptor for gait representation against original DCS.Metric learning and binary representation for compact gait descriptors.Thorough experimental evaluation on standard datasets CASIA-B, CASIA-C and TUM-GAID.State-of-the-art results on verification and identification tasks. Recently, short-term dense trajectories features (DTF) have shown state-of-the-art results in video recognition and retrieval. However, their use has not been extensively studied on the problem of gait recognition. Therefore, the goal of this work is to propose and evaluate diverse strategies to improve recognition performance in the task of gait recognition based on DTF. In particular, this paper will show that (i) the proposed RootDCS descriptor improves on DCS in most tested cases; (ii) selecting relevant trajectories in an automatic way improves the recognition performance in several situations; (iii) applying a metric learning technique to reduce dimensionality of feature vectors improves on standard PCA; and (iv) binarization of low-dimensionality feature vectors not only reduces storage needs but also improves recognition performance in many cases. The experiments are carried out on the popular datasets CASIA, parts B and C, and TUM-GAID showing improvement on state-of-the-art results for most scenarios.

computer analysis of images and patterns | 2015

Empirical Study of Audio-Visual Features Fusion for Gait Recognition

Francisco M. Castro; Manuel J. Marín-Jiménez; Nicolás Guil

The goal of this paper is to evaluate how the fusion of audio and visual features can help in the challenging task of people identification based on their gait (i.e. the way they walk), or gait recognition. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio patterns associated to the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging ‘TUM GAID’ dataset, which contains audio recordings in addition to image sequences. The experimental results show that using late fusion to combine two kinds of tracklet-based visual features with audio features improves the state-of-the-art results on the standard experiments defined on the dataset.

international conference on biometrics | 2017

Evaluation of Cnn Architectures for Gait Recognition Based on Optical Flow Maps

Francisco M. Castro; Manuel J. Marín-Jiménez; Nicolás Guil; Santiago Lopez-Tapia; Nicolás Pérez de la Blanca

This work targets people identification in video based on the way they walk (i.e. gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the training ones makes the search of a good CNN architecture a challenging task. We carry out a thorough experimental evaluation deploying and analyzing four distinct CNN models with different depth but similar complexity. We show that even the simplest CNN models greatly improve the results using shallow classifiers. All our experiments have been carried out on the challenging TUM-GAID dataset, which contains people in different covariate scenarios (i.e. clothing, shoes, bags).

european conference on computer vision | 2018

End-to-End Incremental Learning

Francisco M. Castro; Manuel J. Marín-Jiménez; Nicolás Guil; Cordelia Schmid; Karteek Alahari

Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model—a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and ImageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.

Concurrency and Computation: Practice and Experience | 2018

Energy-based tuning of convolutional neural networks on multi-GPUs: Energy-based tuning of convolutional neural networks on multi-GPUs

Francisco M. Castro; Nicolás Guil; M. J. Marín-Jiménez; J. Pérez-Serrano; Manuel Ujaldon

Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context, Convolutional Neural Network (CNN) models constitute a representative example of success on a wide set of complex applications, particularly on datasets where the target can be represented through a hierarchy of local features of increasing semantic complexity. In most of the real scenarios, the roadmap to improve results relies on CNN settings involving brute force computation, and researchers have lately proven Nvidia GPUs to be one of the best hardware counterparts for acceleration. Our work complements those findings with an energy study on critical parameters for the deployment of CNNs on flagship image and video applications, ie, object recognition and people identification by gait, respectively. We evaluate energy consumption on four different networks based on the two most popular ones (ResNet/AlexNet), ie, ResNet (167 layers), a 2D CNN (15 layers), a CaffeNet (25 layers), and a ResNetIm (94 layers) using batch sizes of 64, 128, and 256, and then correlate those with speed‐up and accuracy to determine optimal settings. Experimental results on a multi‐GPU server endowed with twin Maxwell and twin Pascal Titan X GPUs demonstrate that energy correlates with performance and that Pascal may have up to 40% gains versus Maxwell. Larger batch sizes extend performance gains and energy savings, but we have to keep an eye on accuracy, which sometimes shows a preference for small batches. We expect this work to provide a preliminary guidance for a wide set of CNN and DL applications in modern HPC times, where the GFLOPS/w ratio constitutes the primary goal.

international conference on pattern recognition | 2014