Alberto Abad | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alberto Abad is active.

Explore More

Publication

Featured researches published by Alberto Abad.

international conference on acoustics, speech, and signal processing | 2009

Non-speech audio event detection

José Portelo; Miguel Bugalho; Isabel Trancoso; João Paulo Neto; Alberto Abad; António Joaquim Serralheiro

Audio event detection is one of the tasks of the European project VIDIVIDEO. This paper focuses on the detection of non-speech events, and as such only searches for events in audio segments that have been previously classified as non-speech. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM and HMM-based classifiers, using a 290-hour corpus of sound effects. Although we have only built detectors for 15 semantic concepts so far, the method seems easily portable to other concepts. The paper reports experiments with multiple features, different kernels and several analysis windows. Preliminary experiments on documentaries and films yielded promising results, despite the difficulties posed by the mixtures of audio events that characterize real sounds.

international conference on biometrics | 2013

The 2013 speaker recognition evaluation in mobile environment

Elie Khoury; B. Vesnicer; Javier Franco-Pedroso; Ricardo Paranhos Velloso Violato; Z. Boulkcnafet; L. M. Mazaira Fernandez; Mireia Diez; J. Kosmala; Houssemeddine Khemiri; T. Cipr; Rahim Saeidi; Manuel Günther; J. Zganec-Gros; R. Zazo Candil; Flávio Olmos Simões; M. Bengherabi; A. Alvarez Marquina; Mikel Penagarikano; Alberto Abad; M. Boulayemen; Petr Schwarz; D.A. van Leeuwen; J. Gonzalez-Dominguez; M. Uliani Neto; E. Boutellaa; P. Gómez Vilda; Amparo Varona; Dijana Petrovska-Delacrétaz; Pavel Matejka; Joaquin Gonzalez-Rodriguez

This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection error trade-off (DET) confirm that the best performing systems are based on total variability modeling, and are the fusion of several sub-systems. Nevertheless, the good old UBM-GMM based systems are still competitive. The results also show that the use of additional data for training as well as gender-dependent features can be helpful.

Computer Speech & Language | 2013

Automatic word naming recognition for an on-line aphasia treatment system

Alberto Abad; Anna Pompili; Angela Costa; Isabel Trancoso; José Manuel Fonseca; Gabriela Leal; Luísa Farrajota; Isabel Pavão Martins

Abstract One of the most common effects among aphasia patients is the difficulty to recall names or words. Typically, word retrieval problems can be treated through word naming therapeutic exercises. In fact, the frequency and the intensity of speech therapy are key factors in the recovery of lost communication functionalities. In this sense, speech and language technology can have a relevant contribution in the development of automatic therapy methods. In this work, we present an on-line system designed to behave as a virtual therapist incorporating automatic speech recognition technology that permits aphasia patients to perform word naming training exercises. We focus on the study of the automatic word naming detector module and on its utility for both global evaluation and treatment. For that purpose, a database consisting of word naming therapy sessions of aphasic Portuguese native speakers has been collected. In spite of the different patient characteristics and speech quality conditions of the collected data, encouraging results have been obtained thanks to a calibration method that makes use of the patients’ word naming ability to automatically adapt to the patients’ speech particularities.

international conference on multimedia and expo | 2009

Audio contributions to semantic video search

Isabel Trancoso; Thomas Pellegrini; José Portelo; Hugo Meinedo; Miguel Bugalho; Alberto Abad; João Paulo Neto

This paper summarizes the contributions to semantic video search that can be derived from the audio signal. Because of space restrictions, the emphasis will be on non-linguistic cues. The paper thus covers what is generally known as audio segmentation, as well as audio event detection. Using machine learning approaches, we have built detectors for over 50 semantic audio concepts.

international conference on acoustics, speech, and signal processing | 2007

Multimodal Head Orientation Towards Attention Tracking in Smartrooms

Carlos Segura; Cristian Canton-Ferrer; Alberto Abad; Josep R. Casas; Javier Hernando

This paper presents a multimodal approach to head pose estimation and 3D gaze orientation of individuals in a SmartRoom environment equipped with multiple cameras and microphones. We first introduce the two monomodal approaches as reference. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Two multimodal information fusion schemes working at data and decision levels are analyzed in terms of accuracy and robustness of the estimation. Experimental results conducted over the CLEAR evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.

international conference on acoustics, speech, and signal processing | 2014

Accounting for the residual uncertainty of multi-layer perceptron based features

Ramón Fernández Astudillo; Alberto Abad; Isabel Trancoso

Multi-Layer Perceptrons (MLPs) are often interpreted as modeling a posterior distribution over classes given input features using the mean field approximation. This approximation is fast but neglects the residual uncertainty of inference at each layer, making inference less robust. In this paper we introduce a new approximation of MLP inference that takes under consideration this residual uncertainty. The proposed algorithm propagates not only the mean, but also the variance of inference through the network. At the current stage, the proposed method can not be used with soft-max layers. Therefore, we illustrate the benefits of this algorithm in a tandem scheme. We use the residual uncertainty of inference of MLP-based features to compensate a GMM-HMM backend with uncertainty decoding. Experiments on the Aurora4 corpus show consistent improvement of performance against conventional MLPs for all scenarios, in particular for clean speech and multi-style training.

european signal processing conference | 2015

Multi-room speech activity detection using a distributed microphone network in domestic environments

Panagiotis Giannoulis; Alessio Brutti; Marco Matassoni; Alberto Abad; Athanasios Katsamanis; Miguel Matos; Gerasimos Potamianos; Petros Maragos

Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically degrade the performance of standard speech processing algorithms. In this application scenario, a crucial task is the detection and localization of speech events generated by users within the various rooms. A specific challenge of multi-room environments is the inter-room interference that negatively affects speech activity detectors. In this paper, we present and compare different solutions for the multi-room speech activity detection task. The combination of a model-based room-independent speech activity detection module with a room-dependent inside/outside classification stage, based on specific features, provides satisfactory performance. The proposed methods are evaluated on a multi-room, multi-channel corpus, where spoken commands and other typical acoustic events occur in different rooms.

ieee automatic speech recognition and understanding workshop | 2011

Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation

Luis Javier Rodriguez-Fuentes; Mikel Penagarikano; Amparo Varona; Mireia Diez; Germán Bordel; David Martinez; Jesús Villalba; Antonio Miguel; Alfonso Ortega; Eduardo Lleida; Alberto Abad; Oscar Koller; Isabel Trancoso; Paula Lopez-Otero; Laura Docio-Fernandez; Carmen García-Mateo; Rahim Saeidi; Mehdi Soufifar; Tomi Kinnunen; Torbjørn Svendsen; Pasi Fränti

Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless the fusion approach, it is assumed that different systems may contribute complementary information, either because they are developed on different datasets, or because they use different features or different modeling approaches. Most authors apply fusion as a final resource for improving performance based on an existing set of systems. Though relative performance gains decrease as larger sets of systems are considered, best performance is usually attained by fusing all the available systems, which may lead to high computational costs. In this paper, we aim to discover which technologies combine the best through fusion and to analyse the factors (data, features, modeling methodologies, etc.) that may explain such a good performance. Results are presented and discussed for a number of systems provided by the participating sites and the organizing team of the Albayzin 2010 Language Recognition Evaluation. We hope the conclusions of this work help research groups make better decisions in developing language recognition technology.

processing of the portuguese language | 2008

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data

Alberto Abad; Hugo Meinedo; João Paulo Neto

Automatic transcription of telephone speech involves additional challenges compared to wideband data processing, mainly due to channel limitations and to particular characteristics of conversational telephone speech. While in TV speech recognition applications, such as automatic transcription of broadcast news, the presence of telephone data is nearly insignificant (less than 1 %), in most radio broadcast stations the presence of telephone speech grows significantly. Thus, transcription of telephone speech data deserves special attention in radio broadcast applications. In this work, we describe our initial efforts to tackle this particular problem. First, a telephone channel classifier is proposed to automatically detect telephone segments. Then, some strategies for increasing robustness of the automatic transcription system are investigated.

CLEaR | 2006

UPC audio, video and multimodal person tracking systems in the clear evaluation campaign

Alberto Abad; Cristian Canton-Ferrer; Carlos Segura; José Luis Landabaso; Dusan Macho; Josep R. Casas; Javier Hernando; Montse Pardàs; Climent Nadeu

Reliable measures of person positions are needed for computational perception of human activities taking place in a smart-room environment. In this work, we present the Person Tracking systems developed at UPC for audio, video and audio-video modalities in the context of the EU funded CHIL project research activities. The aim of the designed systems, and particularly of the new contributions proposed, is to deal robustly in both single and multiperson localization tasks independently on the environmental conditions. Besides the technology description, experimental results conducted for the CLEAR evaluation workshop are also reported.

Explore More