Nilesh Madhu
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nilesh Madhu.
international conference on acoustics, speech, and signal processing | 2008
Nilesh Madhu; Colin Breithaupt; Rainer Martin
This contribution details the development of a mask-based post- processor to improve the interference suppression in speech signals separated using linear deconvolution algorithms like independent component analysis (ICA). The design of the proposed post-filter is in two stages: in the first stage, use is made of the disjointness of the separated signals in the time-frequency domain to obtain binary masks to suppress cross-talk that generally remains after separation. In the next stage, a novel smoothing of the masks is proposed that preserves the speech structure of the target source while eliminating the random peaks in the time-frequency plane that lead to fluctuating background noise. The result is an enhanced signal with reduced cross-talk and no musical noise.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Nilesh Madhu; Ann Spriet; Sofie Jansen; Raphael Koning; Jan Wouters
Whereas state-of-the-art single-channel noise reduction algorithms for auditory prostheses demonstrate an appreciable suppression of the noise and improved speech quality, they are unable, thus far, to improve the intelligibility of noise-degraded speech signals. Alternative approaches to speech enhancement using a binary time-frequency mask have demonstrated substantial intelligibility improvements in low signal-to-noise-ratio (SNR) conditions under ideal settings, making this a promising research direction for auditory prostheses. These approaches exploit the sparsity and disjoint-ness of speech spectra in their short-time-frequency representation to preserve only the target-dominant time-frequency regions in the processed output. State-of-the-art noise reduction algorithms in contrast are soft-decision approaches which weight each time-frequency region in proportion to the prevailing SNR. However, the potential for intelligibility improvement using these approaches has not been examined systematically vis-à-vis the binary mask alternative. This contribution compares the performance of an ideal soft-decision system, exemplified by the ideal Wiener filter (IWF), and the ideal binary mask (IBM) for single-channel speech enhancement for auditory prostheses. To obtain results relevant to this application area, a (relatively) low spectral resolution, modelled using the Bark-spectrum scale, is used for both the IWF and the IBM. This spectral resolution is comparable to that being used in commercial hearing instruments. The comparison is in terms of potential for intelligibility improvement and resulting signal quality. Intelligibility tests carried out under various noise conditions and SNRs show that the IWF leads to higher intelligibility scores than the IBM in low SNR conditions. Under non-ideal parameter estimates, it is demonstrated that the IWF approach is also much less sensitive to estimation errors. Quality-wise, a preference for the IWF exists. This was evaluated using a two-stage, pair-wise preference-rating test.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Nilesh Madhu; Rainer Martin
We build upon our speaker localization framework developed in a previous work (N. Madhu and R. Martin, A scalable framework for multiple speaker localization and tracking,” in Proc. Int. Workshop Acoustic Echo Noise Control (IWAENC), Sep. 2008) to perform source separation. The proposed approach, exploiting the supplementary information from the mixture of Gaussians-based localization model, allows for the incorporation of a wide class of separation algorithms, from the nonlinear time-frequency mask-based approaches to a fully adaptive beamformer in the generalized sidelobe canceller (GSC) structure. We propose, in addition, a generalized estimation of the blocking matrix based on subspace projectors. The adaptive beamformer realized as proposed is insensitive to gain mismatches among the sensors, obviating the need for magnitude calibration of the microphones. It is also demonstrated that the proposed linear approach has a performance comparable to that of an optimal (oracle) GSC implementation. In comparison to ICA-based approaches, another advantage of the separation framework described herein is its robustness to ambient noise and scenarios with an unknown number of sources.
IEEE Transactions on Biomedical Engineering | 2015
Raphael Koning; Nilesh Madhu; Jan Wouters
Hearing impaired listeners using cochlear implants (CIs) suffer from a decrease in speech intelligibility (SI) in adverse listening conditions. Time-frequency masks are often applied to perform noise suppression in an attempt to increase SI. Two important masks are the so-called ideal binary mask (IBM) with its binary weights and the ideal Wiener filter (IWF) with its continuous weights. It is unclear which of the masks has the highest potential for SI and speech quality enhancement in CI users. In this study, both approaches for SI and quality enhancement were compared. The investigations were conducted in normal-hearing (NH) subjects listening to noise vocoder CI simulations and in CI users. The potential for SI improvement was assessed in a sentence recognition task with ideal mask estimates in multitalker babble and with an interfering talker. The robustness of the approaches was evaluated with simulated estimation errors. CI users assessed the speech quality in a preference rating. The IWF outperformed the IBM in NH listeners. In contrast, no significant difference was obtained in CI users. Estimation errors degraded SI in CI users for both approaches. In terms of quality, the IWF outperformed, slightly, the IBM processed signals. The outcomes of this study suggest that the mask pattern is not that crucial for CIs. Results of speech enhancement algorithms obtained with NH subjects listening to vocoded or normally processed stimuli do not translate to CI users. This outcome means that the effect of new strategies has to be quantified with the user group considered.
Medical & Biological Engineering & Computing | 2012
Nilesh Madhu; Radu Ranta; Louis Maillard; Laurent Koessler
The starting point of this paper is the analysis of the reference problem in intra-cerebral electroencephalographic (iEEG) recordings. It is well accepted that both surface and depth EEG signals are always recorded with respect to some unknown time-varying signal called reference. This article discusses different methods for determining and reducing the influence of the reference signal for the iEEG signals. In particular, we derive optimal approaches for the estimation of the reference signal in iEEG recording setups and demonstrate their relation to the well-known minimum power/variance distortionless response approaches derived for general array and antenna signal processing applications. We show that the proposed approaches achieve optimal performance in terms of estimation error and that they outperform other reference identification methods proposed in the literature. The developed algorithms are illustrated on simulated examples and on real iEEG signals.
international symposium on communications control and signal processing | 2010
Nilesh Madhu; Jan Wouters
Presented is a microphone-array based approach for the extraction of individual signals from a mixture of competing sources and background noise. Source separation is done using data-driven soft-masks, the parameters for the estimation of these masks being obtained from an extension of a recently proposed source localisation and tracking framework. The separation algorithm is applicable to any arbitrary array - allowing for its integration into a wide variety of applications. The advantage of the proposed mask generation over state-of-the-art mask-based algorithms is the implicit scalability with respect to the number of microphones (M), the number of sources (Q), spatial source spread, and reverberation - obviating the need for heuristic adaptation of the mask generation to different acoustical scenarios. The individual signals extracted using the soft-masks evince low amounts of musical noise. Smoothing these masks in their cepstral representation further reduces the musical noise phenomenon whilst preserving the signal of interest, thereby improving the listening experience.
IEEE Signal Processing Letters | 2012
Radu Ranta; Nilesh Madhu
This letter demonstrates the theoretical equivalence between the different ICA/beamforming based solutions proposed in the literature for the reference-estimation problem in electroencephalographic (EEG) recordings . By reference, we understand an unknown, non-null, time-varying potential, measured at the reference electrode situated sufficiently distant from the measuring electrodes. Despite the theoretical equivalence of the various approaches, they do not yield identical results in practice. This discrepancy is primarily due to the practical implementation of the underlying approach. We show in this context that the most reliable solution avoids blind source separation and montage transformation in addition to making full use of available a priori knowledge.
international conference on signal processing | 2012
Sebastian Gergen; Christian Borss; Nilesh Madhu; Rainer Martin
In 2011, Borß introduced a parametric model for the design of virtual acoustics, which creates a natural sounding virtual environment for applications requiring virtualization e.g., in teleconferencing systems and computer games. In this work we refine this model to make it applicable for the simulation of room acoustics and reverberation to aid in the development of single- and multi-channel audio signal enhancement systems. The model takes early reflections with a frequency-dependent attenuation and the diffuse character of late reverberation with its coherence characteristics into account and provides predefined rooms and reverberation times according to a norm defined by the German Institute for Standardization, to ensure a high degree of realism and usability. Compared to the standard image source model for generating virtual acoustics, the proposed system generates a more realistic virtual acoustic environment.
international conference on acoustics, speech, and signal processing | 2008
Nilesh Madhu; Ivan Tashev; Alex Acero
This paper introduces a new acoustic echo suppression (AES) algorithm for suppressing the residual echo after the acoustic echo canceller (AEC). By temporally segmenting the frequency bins of the residual signal spectrum into blocks and modelling the data in each block and each frequency bin as realizations of a random variable, we can compute the probability of presence of residual echo and derive an appropriate ML suppression rule based on this probability. The computation of the probabilities is based on the Expectation Maximization algorithm. The proposed method shows better performance as compared to state of the art methods for residual echo suppression while producing no audible degradation in the near end signal and no musical noise. Test results indicate that the proposed approach provides an increase in the ERLE of up to 3 dB more than the state of the art echo suppressor while yielding a comparable mean opinion score (MOS) for the near end speech quality. Furthermore, the proposed method is independent of the double talk detector - which makes it robust to misclassifications on the part of the AEC algorithm.
european signal processing conference | 2017
Mehdi Zohourian; Rainer Martin; Nilesh Madhu
In this work we evaluate the effects of the head radius on binaural localization algorithms. We employ a spherical head model and the null-steering beamforming localization method. The model characterizes the binaural cues in the form of HRTFs. One of the main parameters in this model is the head radius. We propose to optimize jointly for both the source location and the head radius. In contrast to the free-field configuration where it is difficult to estimate the source location and microphone distance simultaneously, the binaural algorithm yields a unique solution to the head radius. Moreover, for real recordings we show that the commonly-assumed size of the head achieves a fairly reliable performance. For applications with non-typical size of the head, e.g., hearing-impaired children the adaptation of the head radius using the proposed algorithm would improve the accuracy of the binaural localization algorithm.