Is this you? Create Your Porfile

Taras Butko

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taras Butko is active.

Explore More

Publication

Featured researches published by Taras Butko.

Eurasip Journal on Audio, Speech, and Music Processing | 2011

Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

Taras Butko; Climent Nadeu

Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a previous audio segmentation stage may be useful to improve the robustness of speech technologies like automatic speech recognition and speaker diarization. In this article, we present the evaluation of broadcast news audio segmentation systems carried out in the context of the Albayzín-2010 evaluation campaign. That evaluation consisted of segmenting audio from the 3/24 Catalan TV channel into five acoustic classes: music, speech, speech over music, speech over noise, and the other. The evaluation results displayed the difficulty of this segmentation task. In this article, after presenting the database and metric, as well as the feature extraction methods and segmentation techniques used by the submitted systems, the experimental results are analyzed and compared, with the aim of gaining an insight into the proposed solutions, and looking for directions which are promising.

EURASIP Journal on Advances in Signal Processing | 2011

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Taras Butko; Cristian Canton-Ferrer; Carlos Segura; Xavier Giró; Climent Nadeu; Javier Hernando; Josep R. Casas

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.

computer vision and pattern recognition | 2009

Audiovisual event detection towards scene understanding

Cristian Canton-Ferrer; Taras Butko; Carlos Segura; Xavier Giró; Climent Nadeu; Javier Hernando; Josep R. Casas

Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.

IEEE Transactions on Biomedical Engineering | 2015

Acoustic Gaits: Gait Analysis With Footstep Sounds

M. Umair Bin Altaf; Taras Butko; Biing-Hwang Fred Juang

We describe the acoustic gaits-the natural human gait quantitative characteristics derived from the sound of footsteps as the person walks normally. We introduce the acoustic gait profile, which is obtained from temporal signal analysis of sound of footsteps collected by microphones and illustrate some of the spatio-temporal gait parameters that can be extracted from the acoustic gait profile by using three temporal signal analysis methods-the squared energy estimate, Hilbert transform and Teager-Kaiser energy operator. Based on the statistical analysis of the parameter estimates, we show that the spatio-temporal parameters and gait characteristics obtained using the acoustic gait profile can consistently and reliably estimate a subset of clinical and biometric gait parameters currently in use for standardized gait assessments. We conclude that the Teager-Kaiser energy operator provides the most consistent gait parameter estimates showing the least variation across different sessions and zones. Acoustic gaits use an inexpensive set of microphones with a computing device as an accurate and unintrusive gait analysis system. This is in contrast to the expensive and intrusive systems currently used in laboratory gait analysis such as the force plates, pressure mats and wearable sensors, some of which may change the gait parameters that are being measured.

international conference on acoustics, speech, and signal processing | 2011

Audio segmentation of broadcast news: A hierarchical system with feature selection for the Albayzin-2010 evaluation

Taras Butko; Climent Nadeu

In this paper, we present an audio segmentation system for broadcast news, and its results in the Albayzín-2010 evaluation. First of all, the Albayzín-2010 evaluation setup, developed by the authors, is presented; in particular, the database and the metric are described. The reported hierarchical HMM-GMM-based system is composed of one binary detector for each of the five considered classes (music, speech, speech over music, speech over noise and other). A fast one-pass-training feature selection technique is adapted to the audio segmentation task to improve the results and to reduce the dimensionality of the input feature vector.

Eurasip Journal on Audio, Speech, and Music Processing | 2014

Source ambiguity resolution of overlapped sounds in a multi-microphone room environment

Rupayan Chakraborty; Climent Nadeu; Taras Butko

When several acoustic sources are simultaneously active in a meeting room scenario, and both the position of the sources and the identity of the time-overlapped sound classes have been estimated, the problem of assigning each source position to one of the sound classes still remains. This problem is found in the real-time system implemented in our smart-room, where it is assumed that up to two acoustic events may overlap in time and the source positions are relatively well separated in space. The position assignment system proposed in this work is based on fusion of model-based log-likelihood ratios obtained after carrying out several different partial source separations in parallel. To perform the separation, frequency-invariant null-steering beamformers, which can work with a small number of microphones, are used. The experimental results using all the six microphone arrays deployed in the room show a high assignment rate in our particular scenario.

database and expert systems applications | 2010

On Enhancing Acoustic Event Detection by Using Feature Selection and Audiovisual Feature-Level Fusion

Taras Butko; Climent Nadeu

The detection of the acoustic events (AEs) that are naturally produced in a meeting room may help to describe the human and social activity that takes place in it. Even if the number of considered events is not large, that detection becomes a difficult task in scenarios where the AEs are produced rather spontaneously and they often overlap in time. In this work, we aim to improve the detection of AEs by two different ways: first, we select the most discriminative spectro-temporal audio features by using a hill-climbing wrapper method; second, we add new features coming from video signals as well as from an acoustic source localization system. A new metric is also proposed to conduct feature selection. Besides confirming the interest of using video and source localization information, the results obtained from audiovisual data collected in our multimodal room show that an improved accuracy can be obtained using an acoustic detection system based on a selected subset of features instead of the whole set of features.

international conference on machine learning | 2008

Inclusion of Video Information for Detection of Acoustic Events Using the Fuzzy Integral

Taras Butko; Andrey Temko; Climent Nadeu; Cristian Canton

When applied to interactive seminars, the detection of acoustic events from only audio information shows a large amount of errors, which are mostly due to the temporal overlaps of sounds. Video signals may be a useful additional source of information to cope with that problem for particular events. In this work, we aim at improving the detection of steps by using two audio-based Acoustic Event Detection (AED) systems, with SVM and HMM, and a video-based AED system, which employs the output of a 3D video tracking algorithm. The fuzzy integral is used to fuse the outputs of the three detection systems. Experimental results using the CLEAR 2007 evaluation data show that video information can be successfully used to improve the results of audio-based AED.

european signal processing conference | 2011