Cristian Canton-Ferrer

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cristian Canton-Ferrer is active.

Explore More

Publication

Featured researches published by Cristian Canton-Ferrer.

computer vision and pattern recognition | 2016

Learning by Tracking: Siamese CNN for Robust Target Association

Laura Leal-Taixé; Cristian Canton-Ferrer; Konrad Schindler

This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a Siamese convolutional neural network (CNN) is trained to learn descriptors encoding local spatio-temporal structures between the two input image patches, aggregating pixel values and optical flow information. Second, a set of contextual features derived from the position and size of the compared input patches are combined with the CNN output by means of a gradient boosting classifier to generate the final matching probability. This learning approach is validated by using a linear programming based multi-person tracker showing that even a simple and efficient tracker may outperform much more complex models when fed with our learned matching probabilities. Results on publicly available sequences show that our method meets state-of-the-art standards in multiple people tracking.

international conference on computational science | 2005

Towards a bayesian approach to robust finding correspondences in multiple view geometry environments

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a new Bayesian approach to the problem of finding correspondences of moving objects in a multiple calibrated camera environment. Moving objects are detected and segmented in multiple cameras using a background learning technique. A Point Based Feature (PBF) of each foreground region is extracted, in our case, the top. This features will be the support to establish the correspondences. A reliable, efficient and fast computable distance, the symmetric epipolar distance, is proposed to measure the closeness of sets of points belonging to different views. Finally, matching the features from different cameras originating from the same object is achieved by selecting the most likely PBF in each view under a Bayesian framework. Results are provided showing the effectiveness of the proposed algorithm even in case of severe occlusions or with incorrectly segmented foreground regions.

Multimodal Technologies for Perception of Humans | 2008

Head Orientation Estimation Using Particle Filtering in Multiview Scenarios

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a novel approach to the problem of determining head pose estimation and face 3D orientation of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited and the head in the scene is approximated by an ellipsoid. Skin patches from each detected head are located in each camera view. Data fusion is performed by back-projecting skin patches from single images onto the estimated 3D head model, thus providing a synthetic reconstruction of the head appearance. A particle filter is employed to perform the estimation of the head pan angle of the person under study. A likelihood function based on the face appearance is introduced. Experimental results proving the effectiveness of the proposed algorithm are provided for the SmartRoom scenario of the CLEAR Evaluation 2007 Head Orientation dataset.

international conference on image processing | 2005

Fusion of multiple viewpoint information towards 3D face robust orientation detection

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a novel approach to the problem of determining head pose estimation and face 3D orientation of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited and the heads of people in the scene are detected and geometrically approximated by an ellipsoid using a voxel reconstruction and a moment analysis method. Skin patches from each detected head are located in each camera view. Data fusion is performed by back-projecting skin patches from single images onto the estimated 3D head model, thus providing a synthetic reconstruction of the head appearance. Finally, these data are processed in a pattern analysis framework thus giving a reliable and robust estimation of face orientation. Tracking over time is performed by Kalman filtering. Results are provided showing the effectiveness of the proposed algorithm in a Smart-Room scenario.

articulated motion and deformable objects | 2008

Exploiting Structural Hierarchy in Articulated Objects Towards Robust Motion Capture

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a general analysis framework towards exploiting the underlying hierarchical and scalable structure of an articulated object for pose estimation and tracking. The Scalable Human Body Model (SHBM) is presented as a set of human body models ordered following a hierarchy criteria. The concept of annealing is applied to derive a generic particle filtering scheme able to perform a sequential filtering over the models contained in the SHBM leading to a structural annealingprocess. This scheme is applied to perform human motion capture in a multi-camera environment. Finally, the effectiveness of the proposed system is addressed by comparing its performance with the standard and annealed particle filtering approaches over an annotated database.

EURASIP Journal on Advances in Signal Processing | 2011

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Taras Butko; Cristian Canton-Ferrer; Carlos Segura; Xavier Giró; Climent Nadeu; Javier Hernando; Josep R. Casas

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.

computer vision and pattern recognition | 2009

Audiovisual event detection towards scene understanding

Cristian Canton-Ferrer; Taras Butko; Carlos Segura; Xavier Giró; Climent Nadeu; Javier Hernando; Josep R. Casas

Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.

Lecture Notes in Computer Science | 2008

Multi-person Tracking Strategies Based on Voxel Analysis

Cristian Canton-Ferrer; Jordi Salvador; Josep R. Casas; Montse Pardàs

This paper presents two approaches to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited to generate a discrete 3D binary representation of the foreground objects in the scene. Color information obtained from a zenithal camera view is added to this 3D information. The first tracking approach implements heuristic association rules between blobs labelled according to spatiotemporal connectivity criteria. Association rules are based on a cost function which considers their placement and color histogram. In the second approach, a particle filtering scheme adapted to the incoming 3D discrete data is proposed. A volume likelihood function and a discrete 3D re-sampling procedure are introduced to evaluate and drive particles. Multiple targets are tracked by means of multiple particle filters and interaction among them is modeled through a 3D blocking scheme. Evaluation over the CLEAR 2007 database yields quantitative results assessing the performance of the proposed algorithm for indoor scenarios.

EURASIP Journal on Advances in Signal Processing | 2008

Audiovisual head orientation estimation with particle filtering in multisensor scenarios

Cristian Canton-Ferrer; Carlos Segura; Josep R. Casas; Montse Pardàs; Javier Hernando

This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.

CLEaR | 2006

Head pose detection based on fusion of multiple viewpoint information

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a novel approach to the problem of determining head pose estimation and face 3D orientation of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited and the head in the scene is detected and geometrically approximated by an ellipsoid. Skin patches from each detected head are located in each camera view. Data fusion is performed by back-projecting skin patches from single images onto the estimated 3D head model, thus providing a synthetic reconstruction of the head appearance. Finally, these data are processed in a pattern analysis framework thus giving an estimation of face orientation. Tracking over time is performed by Kalman filtering. Results of the proposed algorithm are provided in the SmartRoom scenario of the CLEAR Evaluation.

Explore More