Is this you? Create Your Porfile

Josep R. Casas

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Josep R. Casas is active.

Explore More

Publication

Featured researches published by Josep R. Casas.

IEEE Transactions on Image Processing | 1996

Morphological operators for image and video compression

Philippe Salembier; Patrick Brigger; Josep R. Casas; Montse Pardàs

This paper deals with the use of some morphological tools for image and video coding. Mathematical morphology can be considered as a shape-oriented approach to signal processing, and some of its features make it very useful for compression. Rather than describing a coding algorithm, the purpose of this paper is to describe some morphological tools that have proved attractive for compression. Four sets of morphological transformations are presented: connected operators, the region-growing version of the watershed, the geodesic skeleton, and a morphological interpolation technique. The authors discuss their implementation, and show how they can be used for image and video segmentation, contour coding, and texture coding.

IEEE Transactions on Multimedia | 2012

Real-Time Head and Hand Tracking Based on 2.5D Data

Xavier Suau; Javier Ruiz-Hidalgo; Josep R. Casas

A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.

computer vision and pattern recognition | 2008

TOF imaging in Smart room environments towards improved people tracking

Sigurjon Arni Guomundsson; Rasmus Larsen; Henrik Aanæs; Montse Pardàs; Josep R. Casas

In this paper we present the use of time-of-flight (TOF) cameras in smart-rooms and how this leads to improved results in segmenting the people in the room from the background and consequently better 3D reconstruction of the people. A calibrated rig of one Swissranger SR3100 time-of-flight range camera and a high resolution standard camera is set in a smart-room consisting of 5 other standard cameras. A probabilistic background model is used to segment each view and a shape from silhouette 3D volume is constructed. It is shown that the presence of the range camera gives ways of eliminating regional artifacts and therefore a more robust input for higher level applications such people tracking or human motion analysis.

Computer Vision and Image Understanding | 2008

Shape from inconsistent silhouette

José Luis Landabaso; Montse Pardís; Josep R. Casas

Shape from silhouette (SfS) is the general term used to refer to the techniques that obtain a volume estimate from a set of binary images. In a first step, a number of images are taken from different positions around the scene of interest. Later, each image is segmented to produce binary masks, also called silhouettes, to delimit the objects of interest. Finally, the volume estimate is obtained as the maximal one which yields the silhouettes. The set of silhouettes is usually considered to be consistent which means that there exists at least one volume which completely explains them. However, silhouettes are normally inconsistent due to inaccurate calibration or erroneous silhouette extraction techniques. In spite of that, SfS techniques reconstruct only that part of the volume which projects consistently in all the silhouettes, leaving the rest unreconstructed. In this paper, we extend the idea of SfS to be used with sets of inconsistent silhouettes. We propose a fast technique for estimating that part of the volume which projects inconsistently and propose a criteria for classifying it by minimizing the probability of miss-classification taking into account the 2D error detection probabilities of the silhouettes. A number of theoretical and empirical results are given, showing that the proposed method reduces the reconstruction error.

british machine vision conference | 2012

Metric learning from poses for temporal clustering of human motion

Adolfo López-Méndez; Juergen Gall; Josep R. Casas; Luc Van Gool

Temporal clustering of human motion into semantically meaningful behaviors is a challenging task. While unsupervised methods do well to some extent, the obtained clusters often lack a semantic interpretation. In this paper, we propose to learn what makes a sequence of human poses different from others such that it should be annotated as an action. To this end, we formulate the problem as weakly supervised temporal clustering for an unknown number of clusters. Weak supervision is attained by learning a metric from the implicit semantic distances derived from already annotated databases. Such a metric contains some low-level semantic information that can be used to effectively segment a human motion sequence into distinct actions or behaviors. The main advantage of our approach is that metrics can be successfully used across datasets, making our method a compelling alternative to unsupervised methods. Experiments on publicly available mocap datasets show the effectiveness of our approach.

international conference on computational science | 2005

Towards a bayesian approach to robust finding correspondences in multiple view geometry environments

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a new Bayesian approach to the problem of finding correspondences of moving objects in a multiple calibrated camera environment. Moving objects are detected and segmented in multiple cameras using a background learning technique. A Point Based Feature (PBF) of each foreground region is extracted, in our case, the top. This features will be the support to establish the correspondences. A reliable, efficient and fast computable distance, the symmetric epipolar distance, is proposed to measure the closeness of sets of points belonging to different views. Finally, matching the features from different cameras originating from the same object is achieved by selecting the most likely PBF in each view under a Bayesian framework. Results are provided showing the effectiveness of the proposed algorithm even in case of severe occlusions or with incorrectly segmented foreground regions.

Multimodal Technologies for Perception of Humans | 2008

Head Orientation Estimation Using Particle Filtering in Multiview Scenarios

Cristian Canton-Ferrer; Josep R. Casas; Montse Pardàs

This paper presents a novel approach to the problem of determining head pose estimation and face 3D orientation of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited and the head in the scene is approximated by an ellipsoid. Skin patches from each detected head are located in each camera view. Data fusion is performed by back-projecting skin patches from single images onto the estimated 3D head model, thus providing a synthetic reconstruction of the head appearance. A particle filter is employed to perform the estimation of the head pan angle of the person under study. A likelihood function based on the face appearance is introduced. Experimental results proving the effectiveness of the proposed algorithm are provided for the SmartRoom scenario of the CLEAR Evaluation 2007 Head Orientation dataset.

ubiquitous computing | 2009

Integration of audiovisual sensors and technologies in a smart room

Joachim Neumann; Josep R. Casas; Dusan Macho; Javier Ruiz Hidalgo

At the Technical University of Catalonia (UPC), a smart room has been equipped with 85 microphones and 8 cameras. This paper describes the setup of the sensors, gives an overview of the underlying hardware and software infrastructure and indicates possibilities for high- and low-level multi-modal interaction. An example of usage of the information collected from the distributed sensor network is explained in detail: the system supports a group of students that have to solve a lab assignment related problem.

international conference on image processing | 1994

A new approach to texture coding using stochastic vector quantization

D. Gimeno; Luis Torres; Josep R. Casas

A new method for texture coding which combines 2-D linear prediction and stochastic vector quantization is presented in this paper. To encode a texture, a linear predictor is computed first. Next, a codebook following the prediction error model is generated and the prediction error is encoded with VQ, using an algorithm which takes into account the pixels surrounding the block being encoded. In the decoder, the error image is decoded first and then filtered as a whole, using the prediction filter. Hence, correlation between pixels is not lost from one block to another and a good reproduction quality can be achieved.<<ETX>>

Image and Vision Computing | 2012

Model-based recognition of human actions by trajectory matching in phase spaces

Adolfo López-Méndez; Josep R. Casas

This paper presents a human action recognition framework based on the theory of nonlinear dynamical systems. The ultimate aim of our method is to recognize actions from multi-view video. We estimate and represent human motion by means of a virtual skeleton model providing the basis for a view-invariant representation of human actions. Actions are modeled as a set of weighted dynamical systems associated to different model variables. We use time-delay embeddings on the time series resulting of the evolution of model variables along time to reconstruct phase portraits of appropriate dimensions. These phase portraits characterize the underlying dynamical systems. We propose a distance to compare trajectories within the reconstructed phase portraits. These distances are used to train SVM models for action recognition. Additionally, we propose an efficient method to learn a set of weights reflecting the discriminative power of a given model variable in a given action class. Our approach presents a good behavior on noisy data, even in cases where action sequences last just for a few frames. Experiments with marker-based and markerless motion capture data show the effectiveness of the proposed method. To the best of our knowledge, this contribution is the first to apply time-delay embeddings on data obtained from multi-view video.

Explore More