Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xavier Giró is active.

Publication


Featured researches published by Xavier Giró.


EURASIP Journal on Advances in Signal Processing | 2011

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Taras Butko; Cristian Canton-Ferrer; Carlos Segura; Xavier Giró; Climent Nadeu; Javier Hernando; Josep R. Casas

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.


computer vision and pattern recognition | 2009

Audiovisual event detection towards scene understanding

Cristian Canton-Ferrer; Taras Butko; Carlos Segura; Xavier Giró; Climent Nadeu; Javier Hernando; Josep R. Casas

Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.


international conference on image processing | 2005

Detection of semantic objects using description graphs

Xavier Giró; Ferran Marqués

This paper presents a technique to detect instances of classes (objects) according to their semantic definition in the form of a description graph. Classes are defined as combinations of instances of lower level semantic classes and allow the definition of a semantic tree that organizes classes in semantic levels. At the bottom level of the semantic tree, classes are defined by a perceptual model containing a list of low-level descriptors. The proposed detection algorithm follows a bottom-up/top-down approach, building semantic trees on a region-based representation of the media. The flexibility of the approach is assessed on different examples of planar objects, such as frontal faces, groups of islands, flags and traffic signs.


workshop on image analysis for multimedia interactive services | 2007

Composite Object Detection in Video Sequences: Application to Controlled Environments

Xavier Giró; Ferran Marqués

This paper presents a set of techniques for the detection of composite objects in video recordings of a controlled environment. Firstly, a selective region-based analysis is performed by tuning the algorithm to the perceptual characteristics of the object in the environment. Secondly, the controlled perceptual and semantic variabilities of the object are addressed by the detection analysis thanks to a frame by frame update of the object models, and by allowing multiple models for a single object. The proposed techniques are illustrated in the detection of laptops from a zenithal view in a smart room.


acm multimedia | 2006

From partition trees to semantic trees

Xavier Giró; Ferran Marqués

This paper proposes a solution to bridge the gap between semantic and visual information formulated as a structural pattern recognition problem. Instances of semantic classes expressed by Description Graphs are detected on a region-based representation of visual data expressed with a Binary Partition Tree. The detection process builds instances of Semantic Trees on the top of the Binary Partition Tree using an encyclopedia of models organised as a hierarchy. At the leaves of the Semantic Tree, classes are defined by perceptual models containing a list of low-level descriptors. The proposed solution is assessed in different environments to show its flexibility.


Journal of Universal Computer Science | 2003

Unified Access to Heterogeneous Audiovisual Archives

Yannis S. Avrithis; Giorgos B. Stamou; Manolis Wallace; Ferran Marqués; Philippe Salembier; Xavier Giró; Werner Haas; Heribert Vallant; Michael Zufferey


Archive | 2005

Automatic Extraction and Analysis of Visual Objects Information

Xavier Giró; Verónica Vilaplana; Ferran Marqués; Philippe Salembier


Lecture Notes in Computer Science | 2006

BPT enhancement based on syntactic and semantic criteria

C. Ferran; Xavier Giró; Ferran Marqués; Josep R. Casas


semantics and digital media technologies | 2007

Region-based Annotation Tool using Partition Trees.

Xavier Giró; Neus Camps; Ferran Marqués


Archive | 2016

GRADO EN INGENIERÍA DE TECNOLOGÍAS Y SERVICIOS DE TELECOMUNICACIÓN (Plan 2015). (Unidad docente Optativa) GRADO EN INGENIERÍA DE SISTEMAS AUDIOVISUALES (Plan 2009). (Unidad docente Optativa) 6 Idiomas docencia: Catalán, Castellano, Inglés Unidad responsable: 230 - ETSETB - Escuela Técnica Superior de Ingeniería de Telecomunicación de Barcelona

Ferran Marques; Xavier Giró; Josep R. Casas; Créditos Ects

Collaboration


Dive into the Xavier Giró's collaboration.

Top Co-Authors

Avatar

Ferran Marqués

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Josep R. Casas

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Carlos Segura

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Climent Nadeu

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Cristian Canton-Ferrer

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Javier Hernando

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Philippe Salembier

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Taras Butko

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

C. Ferran

Polytechnic University of Catalonia

View shared research outputs
Researchain Logo
Decentralizing Knowledge