João Lobato Oliveira
University of Porto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João Lobato Oliveira.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Andre Holzapfel; Matthew E. P. Davies; José R. Zapata; João Lobato Oliveira; Fabien Gouyon
In this paper, we propose a method that can identify challenging music samples for beat tracking without ground truth. Our method, motivated by the machine learning method “selective sampling,” is based on the measurement of mutual agreement between beat sequences. In calculating this mutual agreement we show the critical influence of different evaluation measures. Using our approach we demonstrate how to compile a new evaluation dataset comprised of difficult excerpts for beat tracking and examine this difficulty in the context of perceptual and musical properties. Based on tag analysis we indicate the musical properties where future advances in beat tracking research would be most profitable and where beat tracking is too difficult to be attempted. Finally, we demonstrate how our mutual agreement method can be used to improve beat tracking accuracy on large music collections.
portuguese conference on artificial intelligence | 2011
Paulo S. A. Sousa; João Lobato Oliveira; Luís Paulo Reis; Fabien Gouyon
Expressiveness and naturalness in robotic motions and behaviors can be replicated with the usage of captured human movements. Considering dance as a complex and expressive type of motion, in this paper we propose a method for generating humanoid dance motions transferred from human motion capture (MoCap) data. Motion data of samba dance was synchronized to samba music, manually annotated by experts, in order to build a spatiotemporal representation of the dance movement with variability, in relation to the respective musical temporal structure (musical meter). This enabled the determination and generation of variable dance key-poses according to the captured human body model. In order to retarget these key-poses from the original human model into the considered humanoid morphology, we propose methods for resizing and adapting the original trajectories to the robot joints, overcoming its varied kinematic constraints. Finally, a method for generating the angles for each robot joint is presented, enabling the reproduction of the desired poses in a simulated humanoid robot NAO. The achieved results validated our approach, suggesting that our method can generate poses from motion capture and reproduce them on a humanoid robot with a good degree of similarity.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
João Lobato Oliveira; Matthew E. P. Davies; Fabien Gouyon; Luís Paulo Reis
In this paper we propose an audio beat tracking system, IBT, for multiple applications. The proposed system integrates an automatic monitoring and state recovery mechanism, that applies (re-)inductions of tempo and beats, on a multi-agent-based beat tracking architecture. This system sequentially processes a continuous onset detection function while propagating parallel hypotheses of tempo and beats. Beats can be predicted in a causal or in a non-causal usage mode, which makes the system suitable for diverse applications. We evaluate the performance of the system in both modes on two application scenarios: standard (using a relatively large database of audio clips) and streaming (using long audio streams made up of concatenated clips). We show experimental evidence of the usefulness of the automatic monitoring and state recovery mechanism in the streaming scenario (i.e., improvements in beat tracking accuracy and reaction time). We also show that the system performs efficiently and at a level comparable to state-of-the-art algorithms in the standard scenario. IBT is multi-platform, open-source and freely available, and it includes plugins for different popular audio analysis, synthesis and visualization platforms.
robot and human interactive communication | 2012
João Lobato Oliveira; Gökhan Ince; Keisuke Nakamura; Kazuhiro Nakadai; Hiroshi G. Okuno; Luís Paulo Reis; Fabien Gouyon
In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robots actions according to the reliability of the acoustic signals for auditory processing. To validate the frameworks application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.
intelligent robots and systems | 2012
João Lobato Oliveira; Gökhan Ince; Keisuke Nakamura; Kazuhiro Nakadai; Hiroshi G. Okuno; Luís Paulo Reis; Fabien Gouyon
In this paper we propose the integration of an online audio beat tracking system into the general framework of robot audition, to enable its application in musically-interactive robotic scenarios. To this purpose, we introduced a staterecovery mechanism into our beat tracking algorithm, for handling continuous musical stimuli, and applied different multi-channel preprocessing algorithms (e.g., beamforming, ego noise suppression) to enhance noisy auditory signals lively captured in a real environment. We assessed and compared the robustness of our audio beat tracker through a set of experimental setups, under different live acoustic conditions of incremental complexity. These included the presence of continuous musical stimuli, built of a set of concatenated musical pieces; the presence of noises of different natures (e.g., robot motion, speech); and the simultaneous processing of different audio sources on-the-fly, for music and speech. We successfully tackled all these challenging acoustic conditions and improved the beat tracking accuracy and reaction time to music transitions while simultaneously achieving robust automatic speech recognition.
international conference on acoustics, speech, and signal processing | 2012
Andre Holzapfel; Matthew E. P. Davies; José R. Zapata; João Lobato Oliveira; Fabien Gouyon
In this paper, an approach is presented that identifies music samples which are difficult for current state-of-the-art beat trackers. In order to estimate this difficulty even for examples without ground truth, a method motivated by selective sampling is applied. This method assigns a degree of difficulty to a sample based on the mutual disagreement between the output of various beat tracking systems. On a large beat annotated dataset we show that this mutual agreement is correlated with the mean performance of the beat trackers evaluated against the ground truth, and hence can be used to identify difficult examples by predicting poor beat tracking performance. Towards the aim of advancing future beat tracking systems, we demonstrate how our method can be used to form new datasets containing a high proportion of challenging music examples.
international conference on robotics and automation | 2012
João Lobato Oliveira; Gökhan Ince; Keisuke Nakamura; Kazuhiro Nakadai
This paper presents the design and implementation of a real-time real-world beat tracking system which runs on a dancing robot. The main problem of such a robot is that, while it is moving, ego noise is generated due to its motors, and this directly degrades the quality of the audio signal features used for beat tracking. Therefore, we propose to incorporate ego noise reduction as a pre-processing stage prior to our tempo induction and beat tracking system. The beat tracking algorithm is based on an online strategy of competing agents sequentially processing a continuous musical input, while considering parallel hypotheses regarding tempo and beats. This system is applied to a humanoid robot processing the audio from its embedded microphones on-the-fly, while performing simplistic dancing motions. A detailed and multi-criteria based evaluation of the system across different music genres and varying stationary/non-stationary noise conditions is presented. It shows improved performance and noise robustness, outperforming our conventional beat tracker (i.e., without ego noise suppression) by 15.2 points in tempo estimation and 15.0 points in beat-times prediction.
International Journal of Humanoid Robotics | 2015
João Lobato Oliveira; Gökhan Ince; Keisuke Nakamura; Kazuhiro Nakadai; Hiroshi G. Okuno; Fabien Gouyon; Luís Paulo Reis
Dance movement is intrinsically connected to the rhythm of music and is a fundamental form of nonverbal communication present in daily human interactions. In order to enable robots to interact with humans in natural real-world environments through dance, these robots must be able to listen to music while robustly tracking the beat of continuous musical stimuli and simultaneously responding to human speech. In this paper, we propose the integration of a real-time beat tracking system with state recovery with different preprocessing solutions used in robot audition for its application to interactive dancing robots. The proposed system is assessed under different real-world acoustic conditions of increasing complexity, which consider multiple audio sources of different kinds, multiple noise sources of different natures, continuous musical and speech stimuli, and the effects of beat-synchronous ego-motion noise and of jittering in ego noise (EN). The overall results suggest improved beat tracking accuracy with lower reaction times to music transitions, while still enhancing automatic speech recognition (ASR) run in parallel in the most challenging conditions. These results corroborate the application of the proposed system for interactive dancing robots.
International Journal of Computational Intelligence Systems | 2012
Catarina B. Santiago; João Lobato Oliveira; Luís Paulo Reis; Armando Sousa; Fabien Gouyon
Abstract We propose an online sensorimotor architecture for controlling a low-cost humanoid robot to perform dance movements synchronized with musical stimuli. The proposed architecture attempts to overcome the robots motor constraints by adjusting the velocity of its actuators and inter-changing the attended beat metrical-level on-the-fly. Moreover, we propose quantitative metrics for measuring the level of beat-synchrony of the generated robot dancing motion and complement them with a qualitative survey about several aspects of the demonstrated robot dance performances. Tests with different dance movements and musical pieces demonstrated satisfactory beat-synchrony results despite the physical limitations of the robot. The comparison against robot dance sequences generated without inter-changing the attended metrical-level validated our sensorimotor approach for controlling beat-synchronous robot dancing motions using different dance movements and facing distinct musical tempo conditions.
Eurasip Journal on Audio, Speech, and Music Processing | 2012
João Lobato Oliveira; Luiz Alberto Naveda; Fabien Gouyon; Luís Paulo Reis; Paulo S. A. Sousa; Marc Leman
Dance movements are a complex class of human behavior which convey forms of non-verbal and subjective communication that are performed as cultural vocabularies in all human cultures. The singularity of dance forms imposes fascinating challenges to computer animation and robotics, which in turn presents outstanding opportunities to deepen our understanding about the phenomenon of dance by means of developing models, analyses and syntheses of motion patterns. In this article, we formalize a model for the analysis and representation of popular dance styles of repetitive gestures by specifying the parameters and validation procedures necessary to describe the spatiotemporal elements of the dance movement in relation to its music temporal structure (musical meter). Our representation model is able to precisely describe the structure of dance gestures according to the structure of musical meter, at different temporal resolutions, and is flexible enough to convey the variability of the spatiotemporal relation between music structure and movement in space. It results in a compact and discrete mid-level representation of the dance that can be further applied to algorithms for the generation of movements in different humanoid dancing characters. The validation of our representation model relies upon two hypotheses: (i) the impact of metric resolution and (ii) the impact of variability towards fully and naturally representing a particular dance style of repetitive gestures. We numerically and subjectively assess these hypotheses by analyzing solo dance sequences of Afro-Brazilian samba and American Charleston, captured with a MoCap (Motion Capture) system. From these analyses, we build a set of dance representations modeled with different parameters, and re-synthesize motion sequence variations of the represented dance styles. For specifically assessing the metric hypothesis, we compare the captured dance sequences with repetitive sequences of a fixed dance motion pattern, synthesized at different metric resolutions for both dance styles. In order to evaluate the hypothesis of variability, we compare the same repetitive sequences with others synthesized with variability, by generating and concatenating stochastic variations of the represented dance pattern. The observed results validate the proposition that different dance styles of repetitive gestures might require a minimum and sufficient metric resolution to be fully represented by the proposed representation model. Yet, these also suggest that additional information may be required to synthesize variability in the dance sequences while assuring the naturalness of the performance. Nevertheless, we found evidence that supports the use of the proposed dance representation for flexibly modeling and synthesizing dance sequences from different popular dance styles, with potential developments for the generation of expressive and natural movement profiles onto humanoid dancing characters.