João Felipe Santos
Institut national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João Felipe Santos.
IEEE Signal Processing Magazine | 2015
Tiago H. Falk; Vijay Parsa; João Felipe Santos; Kathryn H. Arehart; Oldooz Hazrati; Rainer Huber; James M. Kates; Susan Scollie
This article presents an overview of 12 existing objective speech quality and intelligibility prediction tools. Two classes of algorithms are presented?intrusive and nonintrusive?with the former requiring the use of a reference signal, while the latter does not. Investigated metrics include both those developed for normal hearing (NH) listeners, as well as those tailored particularly for hearing impaired (HI) listeners who are users of assistive listening devices [i.e., hearing aids (HAs) and cochlear implants (CIs)]. Representative examples of those optimized for HI listeners include the speech-to-reverberation modulation energy ratio (SRMR), tailored to HAs (SRMR-HA) and to CIs (SRMR-CI); the modulation spectrum area (ModA); the HA speech quality (HASQI) and perception indices (HASPI); and the perception-model-based quality prediction method for hearing impairments (PEMO-Q-HI). The objective metrics are tested on three subjectively rated speech data sets covering reverberation-alone, noise-alone, and reverberation-plus-noise degradation conditions, as well as degradations resultant from nonlinear frequency compression and different speech enhancement strategies. The advantages and limitations of each measure are highlighted and recommendations are given for suggested uses of the different tools under specific environmental and processing conditions.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge | 2014
Mohammed Senoussaoui; Milton Sarria-Paja; João Felipe Santos; Tiago H. Falk
Audio-visual emotion and mood disorder cues have been recently explored to develop tools to assist psychologists and psychiatrists in evaluating a patients level of depression. In this paper, we present a number of different multimodal depression level predictors using a model fusion approach, in the context of the AVEC14 challenge. We show that an i-vector based representation for short term audio features contains useful information for depression classification and prediction. We also employed a classification step prior to regression to allow having different regression models depending on the presence or absence of depression. Our experiments show that a combination of our audio-based model and two other models based on the LGBP-TOP video features lead to an improvement of 4% over the baseline model proposed by the challenge organizers.
Speech Communication | 2013
João Felipe Santos; Stefano Cosentino; Oldooz Hazrati; Philipos C. Loizou; Tiago H. Falk
Objective intelligibility measurement allows for reliable, low-cost, and repeatable assessment of innovative speech processing technologies, thus dispensing costly and time-consuming subjective tests. To date, existing objective measures have focused on normal hearing model, and limited use has been found for restorative hearing instruments such as cochlear implants (CIs). In this paper, we have evaluated the performance of five existing objective measures, as well as proposed two refinements to one particular measure to better emulate CI hearing, under complex listening conditions involving noise-only, reverberation-only, and noise-plus-reverberation. Performance is assessed against subjectively rated data. Experimental results show that the proposed CI-inspired objective measures outperformed all existing measures; gains by as much as 22% could be achieved in rank correlation.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
João Felipe Santos; Tiago H. Falk
When compared to intrusive speech intelligibility metrics, non-intrusive ones show a stronger dependency on speech content, given the lack of a reference signal for distortion level computation. Reduction of this dependency is an important step needed to develop reliable metrics. In this paper, two different updates to SRMR-CI, a recently-proposed speech intelligibility metric tailored for cochlear implant users, are applied. First, modulation energy thresholding is proposed to reduce the variability caused by the differences in modulation spectral representations for different phonemes and speakers, as well as speech enhancement algorithm artifacts. Second, a narrower range of modulation filters is employed to reduce fundamental frequency effects. Experimental results show that the updated metric outperforms two benchmark metrics, namely ModA and ANIQUE+, by as much as 15% in terms of correlation between objective and subjective ratings, and a relative decrease of 47% in root mean square error compared to the previously-proposed SRMR-CI metric.
international conference on acoustics, speech, and signal processing | 2017
Mohammed Senoussaoui; João Felipe Santos; Tiago H. Falk
Reverberation and noise are known to be the two most important culprits for poor performance in far-field speech applications, such as automatic speech recognition. Recent research has suggested that reverberation-aware speech enhancement (or speech technologies, in general) could be used to improve performance. However, recent results also show existing blind room acoustics characterization algorithms are not robust under ambient noise and there is still room for improvement under such settings. In this paper, several fusion approaches are proposed for noise-robust reverberation time estimation. More specifically, feature- and score-level fusion of short- and long-term speech temporal dynamics features are proposed. With noise-aware feature-level fusion, gains of up to 15.4% could be seen in root mean square error. Score-level fusion, in turn, showed further improvements of up to 9.8%. Relative to a recently-proposed noise-robust benchmark algorithm, improvements of 30% could be seen, thus showing the advantages of speech temporal dynamics fusion approaches for noise-robust reverberation time estimation.
5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016) | 2016
João Felipe Santos; Rachel E. Bouserhal; Jérémie Voix; Tiago H. Falk
Speech captured from an in-ear microphone (IEM) under an intra-aural device is beneficial in extremely noisy environments as it maintains a relatively high signal to noise ratio. Due to its limited bandwidth, speech enhancement is required in order to obtain a more natural speech. Consequently, quick and practical measurement of speech quality is important. In this paper, we compare the performance of the quality of intrusive and non-intrusive objective quality metrics on IEM speech, and propose an adaptation of a non-intrusive metric (SRMR) to IEM speech signals. We show that the updated SRMR metric, SRMR-IEM, significantly reduces the performance gap between non-intrusive and intrusive metrics.
Journal of the Acoustical Society of America | 2013
João Felipe Santos; Nils Peters; Tiago H. Falk
Reverberation time (RT) is an important parameter for room acoustics characterization, intelligibility and quality assessment of reverberant speech, and for dereverberation. Commonly, RT is estimated from the room impulse response (RIR). In practice, however, RIRs are often unavailable or continuously changing. As such, blind estimation of RT based only on the recorded reverberant signals is of great interest. To date, blind RT estimation has focused on reverberant speech signals. Here, we propose to blindly estimate RT from non-speech signals, such as solo instrument recordings and music ensembles. To estimate the RT of non-speech signals, we propose a blind estimator based on an auditory-inspired modulation spectrum signal representation, which measures the modulation frequency of temporal envelopes computed from a 23-channel gammatone filterbank. We show that the higher modulation frequency bands are more sensitive to reverberation than the modulation bands below 20 Hz. When tested on a database of non...
arXiv: Sound | 2016
Bob L. Sturm; João Felipe Santos; Oded Ben-Tal
conference of the international speech communication association | 2012
João Felipe Santos; Stefano Cosentino; Oldooz Hazrati; Philipos C. Loizou; Tiago H. Falk
international conference on learning representations | 2018
Chiheb Trabelsi; Olexa Bilaniuk; Ying Zhang; Dmitriy Serdyuk; Sandeep Subramanian; João Felipe Santos; Soroush Mehri; Negar Rostamzadeh; Yoshua Bengio; Chris Pal