Ferran Diego
Autonomous University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ferran Diego.
IEEE Transactions on Image Processing | 2011
Ferran Diego; Daniel Ponsa; Joan Serrat; Antonio M. López
In this work, we address the problem of aligning two video sequences. Such alignment refers to synchronization, i.e., the establishment of temporal correspondence between frames of the first and second video, followed by spatial registration of all the temporally corresponding frames. Video synchronization and alignment have been attempted before, but most often in the relatively simple cases of fixed or rigidly attached cameras and simultaneous acquisition. In addition, restrictive assumptions have been applied, including linear time correspondence or the knowledge of the complete trajectories of corresponding scene points; to some extent, these assumptions limit the practical applicability of any solutions developed. We intend to solve the more general problem of aligning video sequences recorded by independently moving cameras that follow similar trajectories, based only on the fusion of image intensity and GPS information. The novelty of our approach is to pose the synchronization as a MAP inference problem on a Bayesian network including the observations from these two sensor types, which have been proved complementary. Alignment results are presented in the context of videos recorded from vehicles driving along the same track at different times, for different road types. In addition, we explore two applications of the proposed video alignment method, both based on change detection between aligned videos. One is the detection of vehicles, which could be of use in ADAS. The other is online difference spotting videos of surveillance rounds.
IEEE Transactions on Multimedia | 2013
Ferran Diego; Joan Serrat; Antonio M. López
Video alignment is important in different areas of computer vision such as wide baseline matching, action recognition, change detection, video copy detection and frame dropping prevention. Current video alignment methods usually deal with a relatively simple case of fixed or rigidly attached cameras or simultaneous acquisition. Therefore, in this paper we propose a joint video alignment for bringing two video sequences into a spatio-temporal alignment. Specifically, the novelty of the paper is to formulate the video alignment to fold the spatial and temporal alignment into a single alignment framework. This simultaneously satisfies a frame-correspondence and frame-alignment similarity; exploiting the knowledge among neighbor frames by a standard pairwise Markov random field (MRF). This new formulation is able to handle the alignment of sequences recorded at different times by independent moving cameras that follows a similar trajectory, and also generalizes the particular cases that of fixed geometric transformation and/or linear temporal mapping. We conduct experiments on different scenarios such as sequences recorded simultaneously or by moving cameras to validate the robustness of the proposed approach. The proposed method provides the highest video alignment accuracy compared to the state-of-the-art methods on sequences recorded from vehicles driving along the same track at different times.
IEEE Transactions on Intelligent Transportation Systems | 2013
Jose M. Alvarez; Theo Gevers; Ferran Diego; Antonio M. López
Vision-based road detection is important for different applications in transportation, such as autonomous driving, vehicle collision warning, and pedestrian crossing detection. Common approaches to road detection are based on low-level road appearance (e.g., color or texture) and neglect of the scene geometry and context. Hence, using only low-level features makes these algorithms highly depend on structured roads, road homogeneity, and lighting conditions. Therefore, the aim of this paper is to classify road geometries for road detection through the analysis of scene composition and temporal coherence. Road geometry classification is proposed by building corresponding models from training images containing prototypical road geometries. We propose adaptive shape models where spatial pyramids are steered by the inherent spatial structure of road images. To reduce the influence of lighting variations, invariant features are used. Large-scale experiments show that the proposed road geometry classifier yields a high recognition rate of 73.57% ± 13.1, clearly outperforming other state-of-the-art methods. Including road shape information improves road detection results over existing appearance-based methods. Finally, it is shown that invariant features and temporal information provide robustness against disturbing imaging conditions.
computer vision and pattern recognition | 2014
Luca Fiaschi; Ferran Diego; Konstantin Gregor; Martin Schiegg; Ullrich Koethe; Marta Zlatic; Fred A. Hamprecht
We use weakly supervised structured learning to track and disambiguate the identity of multiple indistinguishable, translucent and deformable objects that can overlap for many frames. For this challenging problem, we propose a novel model which handles occlusions, complex motions and non-rigid deformations by jointly optimizing the flows of multiple latent intensities across frames. These flows are latent variables for which the user cannot directly provide labels. Instead, we leverage a structured learning formulation that uses weak user annotations to find the best hyperparameters of this model. The approach is evaluated on a challenging dataset for the tracking of multiple Drosophila larvae which we make publicly available. Our method tracks multiple larvae in spite of their poor distinguishability and minimizes the number of identity switches during prolonged mutual occlusion.
international conference on intelligent transportation systems | 2010
Ferran Diego; Jose M. Alvarez; Joan Serrat; Antonio M. López
Road segmentation is an essential functionality for supporting advanced driver assistance systems (ADAS) such as road following and vehicle and pedestrian detection. Significant efforts have been made in order to solve this task using vision-based techniques. The major challenge is to deal with lighting variations and the presence of objects on the road surface. In this paper, we propose a new road detection method to infer the areas of the image depicting road surfaces without performing any image segmentation. The idea is to previously segment manually or semi-automatically the road region in a traffic-free reference video record on a first drive. And then to transfer these regions to the frames of a second video sequence acquired later in a second drive through the same road, in an on-line manner. This is possible because we are able to automatically align the two videos in time and space, that is, to synchronize them and warp each frame of the first video to its corresponding frame in the second one. The geometric transform can thus transfer the road region to the present frame on-line. In order to reduce the different lighting conditions which are present in outdoor scenarios, our approach incorporates a shadowless feature space which represents an image in an illuminant-invariant feature space. Furthermore, we propose a dynamic background subtraction algorithm which removes the regions containing vehicles in the observed frames which are within the transferred road region.
international symposium on biomedical imaging | 2013
Ferran Diego; Susanne Reichinnek; Martin Both; Fred A. Hamprecht
We present ADINA, an automated pipeline for analyzing and identifying neuronal activity from calcium imaging data to investigate neuronal activity patterns. This entails the detection and classification of cell centroids and of calcium transients (events) that reappeared during different activity periods as memory consolidation. Specifically, the pipeline implements a sparse dictionary learning to infer the most relevant Ca2+ patterns, an image segmentation procedure using a wavelet- transform and watershed to identify single cells, and an estimation of the transient signals by means of sparse coding exploiting spatial and temporal sparsity. We validate our automated approach on artificial and two different calcium imaging sequences from mice hippocampal slice cultures acquired with fluorescence and confocal microscopes. Our approach achieves ca. 94% sensitivity on average for correctly detecting events, thus improving significantly the estimation of cell signals relative to published procedures.
international conference on image analysis and processing | 2007
Joan Serrat; Ferran Diego; Felipe Lumbreras; Jose M. Alvarez
We address the synchronization of a pair of video sequences captured from moving vehicles and the spatial registration of all the temporally corresponding frames. This is necessary in order to perform the pixel wise comparison of a pair of videos. The novelty of our method is that is free from three common restrictions of most previous works. First, it does not require that the two cameras be rigidly fixed to each other, since they can move independently. Second, the temporal correspondence does not assume a linear mapping. Third, it does not rely on the complete trajectories of image features. We present our results in the context of two applications, outdoor surveillance at night and the comparison of vehicle headlights systems.
british machine vision conference | 2016
Melih Kandemir; Manuel Haussmann; Ferran Diego; Kumar Rajamani; Jeroen van der Laak; Fred A. Hamprecht
We introduce the first model to perform weakly supervised learning with Gaussian processes on up to millions of instances. The key ingredient to achieve this scalability is to replace the standard assumption of MIL that the bag-level prediction is the maximum of instance-level estimates with the accumulated evidence of instances within a bag. This enables us to devise a novel variational inference scheme that operates solely by closedform updates. Keeping all its parameters but one fixed, our model updates the remaining parameter to the global optimum. This virtue leads to charmingly fast convergence, fitting perfectly to large-scale learning setups. Our model performs significantly better in two medical applications than adaptation of GPMIL to scalable inference and various scalable MIL algorithms. It also proves to be very competitive in object classification against state-of-the-art adaptations of deep learning to weakly supervised learning.
international conference on computer vision | 2013
Georgios D. Evangelidis; Ferran Diego; Radu Horaud
This paper addresses the background estimation problem for videos captured by moving cameras, referred to as video grounding. It essentially aims at reconstructing a video, as if it would be without foreground objects, e.g. cars or people. What differentiates video grounding from known background estimation methods is that the camera follows unconstrained motion so that background undergoes ongoing changes. We build on video matching aspects since more videos contribute to the reconstruction. Without loss of generality, we investigate a challenging case where videos are recorded by in-vehicle cameras that follow the same road. Other than video synchronization and spatiotemporal alignment, we focus on the background reconstruction by exploiting inter- and intra-sequence similarities. In this context, we propose a Markov random field formulation that integrates the temporal coherence of videos while it exploits the decisions of a support vector machine classifier about the background ness of regions in video frames. Experiments with real sequences recorded by moving vehicles verify the potential of the video grounding algorithm against state-of-art baselines.
european conference on computer vision | 2016
Martin Schiegg; Ferran Diego; Fred A. Hamprecht
In structured prediction, it is standard procedure to discriminatively train a single model that is then used to make a single prediction for each input. This practice is simple but risky in many ways. For instance, models are often designed with tractability rather than faithfulness in mind. To hedge against such model misspecification, it may be useful to train multiple models that all are a reasonable fit to the training data, but at least one of which may hopefully make more valid predictions than the single model in standard procedure. We propose the Coulomb Structured SVM (CSSVM) as a means to obtain at training time a full ensemble of different models. At test time, these models can run in parallel and independently to make diverse predictions. We demonstrate on challenging tasks from computer vision that some of these diverse predictions have significantly lower task loss than that of a single model, and improve over state-of-the-art diversity encouraging approaches.
Collaboration
Dive into the Ferran Diego's collaboration.
Commonwealth Scientific and Industrial Research Organisation
View shared research outputs