Leonardo O. Nunes
Federal University of Rio de Janeiro
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Leonardo O. Nunes.
IEEE Transactions on Mobile Computing | 2016
Diego B. Haddad; Wallace Alves Martins; Maurício V. M. Costa; Luiz W. P. Biscainho; Leonardo O. Nunes; Bowon Lee
Self-localization of smart portable devices serves as foundation for several novel applications. This work proposes a set of algorithms that enable a mobile device to passively determine its position relative to a known reference with centimeter precision, based exclusively on the capture of acoustic signals emitted by controlled sources around it. The proposed techniques tackle typical practical issues such as reverberation, unknown speed of sound, line-of-sight obstruction, clock skew, and the need for asynchronous operation. After their theoretical developments and off-line simulations, the methods are assessed as real-time applications embedded into off-the-shelf mobile devices operating in real scenarios. When line of sight is available, position estimation errors are at most 4 cm using recorded signals.
IEEE Signal Processing Letters | 2015
Markus V. S. Lima; Wallace Alves Martins; Leonardo O. Nunes; Luiz W. P. Biscainho; Tadeu N. Ferreira; Maurício V. M. Costa; Bowon Lee
This paper proposes an efficient method based on the steered-response power (SRP) technique for sound source localization using microphone arrays: the volumetric SRP (VSRP). As compared to the SRP, by deploying a sparser volumetric grid, the V-SRP achieves a significant reduction of the computational complexity without sacrificing the accuracy of the location estimates. By appending a fine search step to the VSRP, its refined version (RV-SRP) improves on the compromise between complexity and accuracy. Experiments conducted in both simulatedand real-data scenarios demonstrate the benefits of the proposed approaches. Specifically, the RV-SRP is shown to outperform the SRP in accuracy at a computational cost of about ten times lower.This letter proposes an efficient method based on the steered-response power (SRP) technique for sound source localization using microphone arrays: the refined volumetric SRP (RV-SRP). By deploying a sparser volumetric grid, the RV-SRP achieves a significant reduction of the computational complexity without sacrificing the accuracy of location estimates. In addition, a refinement step improves on the compromise between complexity and accuracy. Experiments conducted in both simulated- and real-data scenarios show that the RV-SRP outperforms state-of-the-art methods in accuracy with lower computational cost.
international conference on acoustics, speech, and signal processing | 2007
Leonardo O. Nunes; Ricardo Merched; Luiz W. P. Biscainho
Classic methods for sinusoidal analysis rely on partial tracking, a technique where successive sets of spectral peaks of an audio signal must be properly associated in time. The resulting tracks describe, in terms of amplitude and frequency, the continuous evolution of the so-called partials which, combined, model the complex sounds emitted by a given instrument. A well-known challenge in this context is preserving amplitude and frequency coherence in the tracking mechanism, specially in cases where failure in peak detection may occur, or perhaps in the event of crossing partials. This paper presents a new decision-directed recursive least-squares (RLS) estimation method for frequency and amplitude tracking in sinusoidal analysis. Different performance measurements show that the proposed deterministic algorithm outperforms some procedures currently found in the literature.
workshop on applications of signal processing to audio and acoustics | 2013
Diego B. Haddad; Leonardo O. Nunes; Wallace Alves Martins; Luiz W. P. Biscainho; Bowon Lee
This paper deals with the localization of acoustic sensors based on signals emitted by loudspeakers at known positions. In particular, a model for distortions in time-of-flight (TOF) estimates applicable to the sensor localization problem is presented along with closed-form solutions with low computational cost. The proposed techniques are able to approximate the sensor position even when the TOFs are corrupted by an unknown delay, there is a sampling frequency mismatch between the A/D and D/A converters associated with sensor and loudspeakers, and the speed of sound is unknown. Simulations and an experiment on real data demonstrate that the proposed methods are able to estimate sensor positions with less than 2 cm of error in the evaluated scenarios.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Leonardo O. Nunes; Flávio R. Avila; Alan Freihof Tygel; Luiz W. P. Biscainho; Bowon Lee; Amir Said; Ronald W. Schafer
This paper discusses the automatic quality assessment of echo-degraded speech in the context of teleconference systems. Subjective listening tests conducted over a carefully designed database of signals degraded by acoustic echo have been used to assess how this impairment is perceived and to determine which parameters have a significant impact on speech quality. The results have shown that, similarly to electric transmission line echo, acoustic echo is mainly influenced by echo delay and echo gain. Based on this observation, a mapping between these two parameters and the mean subjective score is devised. Moreover, a signal-based algorithm for the estimation of these parameters is described, and its performance is evaluated. The complete system comprising both the parameter estimators and the mapping function achieves a correlation of 94% between predicted and actual subjective scores, and can be employed as a non-intrusive monitoring tool for in-service quality evaluation of teleconference systems. Further validation indicates the operating range of the proposed quality assessment tool can be extended by proper retraining.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Leonardo O. Nunes; Luiz W. P. Biscainho; Bowon Lee; Amir Said; Ton Kalker; Ronald W. Schafer
This paper addresses the problem of identifying impairment types that might be present in a speech signal. In particular, three acoustically induced degradation types that occur in teleconference systems are considered: acoustic echo, reverberation, and broadband noise, as well as combinations among them. The proposed system is double-ended (full reference) and is developed using a database of degraded full-band speech signals created according to a model for teleconference systems. A set of features obtained from both the degraded and non-degraded signals is proposed and shown to adequately capture information associated with each degradation type. A random forest classifier and a support vector machine are successfully employed, achieving a classification error below 2%. Such classifiers can be used to select an appropriate quality assessment tool for a given degraded signal.
multimedia signal processing | 2009
Luiz W. P. Biscainho; Paulo A. A. Esquef; Fabio P. Freeland; Leonardo O. Nunes; Alan Freihof Tygel; Bowon Lee; Amir Said; Ton Kalker; Ronald W. Schafer
Modern telepresence systems can deliver multimedia signals of unprecedentedly high quality of experience to the user. Setting and maintaining such services call for reliable and automatic tools for multimedia quality probing, in special those targeted at speech data along the transmission path. Most of the objective methods for sound quality assessment (QA) in the literature are intended for either speech signals of 4- to 8-kHz bandwidth or general audio until 24 kHz, but are not specifically designed for speech at high sampling-rates. This work approaches quality evaluation of full-band (24 kHz) high-quality speech corrupted by echo. A simple metric singled out from a standardized double-ended tool for audio QA is proposed as a solution for the problem at hand. Quality measures from a set of speech stimuli corrupted by echo under controlled conditions were obtained via listening tests to allow calibration and evaluation of the proposed method. Experimental results reveal an overall correlation of 0.94 between objective and subjective scores, even in the presence of moderate additive noise.
multimedia signal processing | 2009
Paulo A. A. Esquef; Luiz W. P. Biscainho; Leonardo O. Nunes; Bowon Lee; Amir Said; Ton Kalker; Ronald W. Schafer
In this paper the design of a double-ended (intrusive) diagnostic tool for identifying five types of degradation in audio signals is reported. The impairment types taken into consideration are additive contamination with pink noise, occurrences of signal mutes, distortion by magnitude clipping, and the previous two types mixed with pink noise. As a simple solution to accomplish the established goal, a threshold-based hierarchical classification system is proposed, being completely defined from pre-processing of the input signals, passing through the estimation of a few characteristic features, up to data clustering criteria. Performance evaluation of the classifier is carried out via a validation database containing 60 impaired signals for each type of impairment, with five distinct degradation intensity levels. Considering the types and range of degradation levels considered in this work, excellent results are achieved, scoring above 96% of correctly classified data in the worst case. System performance in identifying mixed impairment types tends to deteriorate as the strength of the noise component increases.
Wireless Communications and Mobile Computing | 2017
Diego B. Haddad; Markus V. S. Lima; Wallace Alves Martins; Luiz W. P. Biscainho; Leonardo O. Nunes; Bowon Lee
The wide availability of mobile devices with embedded microphones opens up opportunities for new applications based on acoustic sensor localization (ASL). Among them, this paper highlights mobile device self-localization relying exclusively on acoustic signals, but with previous knowledge of reference signals and source positions. The problem of finding the sensor position is stated as a function of estimated times-of-flight (TOFs) or time-differences-of-flight (TDOFs) from the sound sources to the target microphone, and the main practical issues involved in TOF estimation are discussed. Least-squares ASL solutions are introduced, followed by other strategies inspired by sound source localization solutions: steered-response power, which improves localization accuracy, and a new region-based search, which alleviates complexity. A set of complementary techniques for further improvement of TOF/TDOF estimates are reviewed: sliding windows, matching pursuit, and TOF selection. The paper proceeds with proposing a novel ASL method that combines most of the previous material, whose performance is assessed in a real-world example: in a typical lecture room, the method achieves accuracy better than 20 cm.
ieee international telecommunications symposium | 2014
Flávio R. Avila; Leonardo O. Nunes; Luiz W. P. Biscainho; Alan Freihof Tygel; Bowon Lee
This paper describes a double-ended objective quality assessment method for evaluating full-band (sampled at 48 kHz) speech signals impaired by acoustic echo and background noise. The proposed method is based on a single metric of the ITU-T standard PEAQ (originally developed for audio signals) that, along with an appropriate mapping function, is shown to be able to reliably predict the overall quality of speech signals concurrently degraded by acoustic echo and background noise. In order to train and validate the proposed method, three speech signal databases were developed and subjectively assessed. One database is used only for training and testing purposes whereas the other two are employed in validation. Using the validation databases, it has been shown that the proposed method can predict the quality of signals degraded with a wider scope of acoustic echo and noise characteristics than those considered in its development.