Verónica Vilaplana
Polytechnic University of Catalonia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Verónica Vilaplana.
IEEE Transactions on Image Processing | 2008
Verónica Vilaplana; Ferran Marqués; Philippe Salembier
This paper discusses the use of binary partition trees (BPTs) for object detection. BPTs are hierarchical region-based representations of images. They define a reduced set of regions that covers the image support and that spans various levels of resolution. They are attractive for object detection as they tremendously reduce the search space. In this paper, several issues related to the use of BPT for object detection are studied. Concerning the tree construction, we analyze the compromise between computational complexity reduction and accuracy. This will lead us to define two parts in the BPT: one providing accuracy and one representing the search space for the object detection task. Then we analyze and objectively compare various similarity measures for the tree construction. We conclude that different similarity criteria should be used for the part providing accuracy in the BPT and for the part defining the search space and specific criteria are proposed for each case. Then we discuss the object detection strategy based on BPT. The notion of node extension is proposed and discussed. Finally, several object detection examples illustrating the generality of the approach and its efficiency are reported.
international conference on image processing | 2009
Miriam Leon; Verónica Vilaplana; Antoni Gasull; Ferran Marqués
This paper presents a technique for detecting caption text for indexing purposes. This technique is to be included in a generic indexing system dealing with other semantic concepts. The various object detection algorithms are required to share a common image description which, in our case, is a hierarchical region-based image model. Caption text objects are detected combining texture and geometric features, which are estimated using wavelet analysis and taking advantage of the region-based image model, respectively. Analysis of the region hierarchy provides the final caption text objects.
Pattern Recognition | 2002
Ferran Marqués; Verónica Vilaplana
Abstract A new technique for segmenting and tracking human faces in video sequences is presented. The algorithm relies on the concepts of connected operators and partition projection to tackle the problems of face segmentation and tracking, respectively. It uses a connected operator to extract the connected component that more likely belongs to a face. Such a connected operator is implemented by means of a Binary Partition Tree. A set of connected regions (a node in the tree) is selected maximizing the estimation of the likelihood of being part of a face. This way, a first estimate of the face is obtained. Final face segmentation is carried out by a face extraction step that applies the previous maximization in the partition space. Faces are tracked through the sequence based on the partition projection approach. Partition projection accommodates the partition of the previous image into the information of the current frame. The current image is classified into face, no face and uncertain regions by projecting the previous face partition. Therefore, the projection yields the first estimate of the face. The face core component is grown up to obtain the final face segmentation using the previous face extraction process. The technique has been successfully assessed using several test sequences in raw format (MPEG-4 database) as well as in MPEG-1 format (from the MPEG-7 database).
CLEaR | 2006
Jordi Luque; Ramon Morros; Ainara Garde; Jan Anguita; Mireia Farrús; Dusan Macho; Ferran Marqués; Claudi Martinez; Verónica Vilaplana; Javier Hernando
In this paper, we address the modality integration issue on the example of a smart room environment aiming at enabling person identification by combining acoustic features and 2D face images. First we introduce the monomodal audio and video identification techniques and then we present the use of combined input speech and face images for person identification. The various sensory modalities, speech and faces, are processed both individually and jointly. Its shown that the multimodal approach results in improved performance in the identification of the participants.
international conference on pattern recognition | 2000
Ferran Marqués; Verónica Vilaplana
A new technique for segmenting and tracking human faces in video sequences is presented. The technique relies on morphological tools such as using connected operators to extract the connected component that more likely belongs to a face, and partition projection to track this component through the sequence. A binary partition tree (BPT) is used to implement the connected operator. The BPT is constructed based on the chrominance criteria and its nodes are analyzed so that the selected node maximizes an estimation of the likelihood of being part of a face. The tracking is performed using a partition projection approach. Images are divided into face and non-face parts, which are tracked through the sequence. The technique has been successfully assessed using several test sequences from the MPEG-4 (raw format) and the MPEG-7 databases (MPEG-1 format).
international conference on image processing | 2004
Oreste Salerno; Montse Pardàs; Verónica Vilaplana; Ferran Marqués
This paper presents an object recognition method that exploits the representation of the images obtained by means of a binary partition tree (BPT). The shape matching technique in which it is based was first presented in F. Marques et al., (2002). This method compares a transformed version of an object shape model (reference contour) to the contours of a partition of the image. The comparison is based on a distance map that measures the Euclidean distance between any points in the image to the partition contours. In F. Marques et al., (2002), this algorithm was applied using a colour-based segmentation of the image and a full-search was performed to find the best match between the searched object and the contours of this segmentation. Here, the information of the binary partition tree is used both to obtain the segmentation and to guide and reduce the search for the optimum match between the shape and the objects of the image.
international conference on image processing | 1999
Ferran Marqués; Verónica Vilaplana; Anabel Buxes
A new technique for segmenting and tracking human faces in video sequences is presented. The algorithm uses a connected operator to extract the connected component that more likely belongs to a face. Such a connected operator is implemented by means of a binary partition tree. A set of connected regions (a node in the tree) is selected maximizing an estimation of the likelihood of being part of a face. Faces are tracked through the sequence based on the partition projection approach. A face and a non-face core component are obtained in the current image by projecting the previous partition. The technique has been successfully assessed using several test sequences from the MPEG-4 database (raw format) as well as from the MPEG-7 database (MPEG-1 format).
international conference on image processing | 2008
Verónica Vilaplana; Ferran Marqués
We present a new technique for object tracking that is an extension of the mean shift tracking algorithm. The proposed technique relies on a segmentation of the area under analysis into a set of color-homogenous regions. The use of regions allows a robust estimation of the likelihood distributions that form the object and background models, as well as a precise shape definition of the object being tracked. Thanks to this accurate object definition, the object model can be updated through the tracking process, handling variations in the object representation. These concepts have been tested in the case of tracking human faces.
workshop on image analysis for multimedia interactive services | 2010
Miriam Leon; Verónica Vilaplana; Antoni Gasull; Ferran Marqués
This paper presents a method for caption text detection. The proposed method will be included in a generic indexing system dealing with other semantic concepts which are to be automatically detected as well. To have a coherent detection system, the various object detection algorithms use a common image description. In our framework, the image description is a hierarchical region-based image model. The proposed method takes advantage of texture and geometric features to detect the caption text. Texture features are estimated using wavelet analysis and mainly applied for Text candidate spotting. In turn, Text characteristics verification is basically carry out relying on geometric features, which are estimated exploiting the region-based image model. Analysis of the region hierarchy provides the final caption text objects. The final step of Consistency analysis for output is performed by a binarization algorithm that robustly estimates the thresholds on the caption text area of support.
content based multimedia indexing | 2013
Carles Ventura; Xavier Giro-i-Nieto; Verónica Vilaplana; Daniel Giribet; Eusebio Carasusan
This paper addresses the problem of video summarization through an automatic selection of a single representative keyframe. The proposed solution is based on the mutual reinforcement paradigm, where a keyframe is selected thanks to its highest and most frequent similarity to the rest of considered frames. Two variations of the algorithm are explored: a first one where only frames within the same video are used (intra-clip mode) and a second one where the decision also depends on the previously selected keyframes of related videos (inter-clip mode). These two algorithms were evaluated by a set of professional documentalists from a broadcasters archive, and results concluded that the proposed techniques outperform the semi-manual solution adopted so far in the company.