Karan Nathwani
Indian Institute of Technology Kanpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karan Nathwani.
Speech Communication | 2015
Ayush Jain; Sanchit Goel; Karan Nathwani; Rajesh M. Hegde
Kalman filtering framework.Joint acoustic echo and noise cancellation.Double talk detector.Linear prediction coding analysis.Expectation maximisation algorithm. In this work, a novel Kalman filtering framework is developed for joint acoustic echo and noise cancellation in a double talk scenario. The efficiency of echo cancellation algorithms is reduced when signals other than the echoed far end signal are present, since the echo path cannot be modelled accurately in such cases. A double talk detector is also used in conjunction with an acoustic echo canceller to handle such a double talk scenario. The method presented in this work is able to model both the near-end speech signal and background noise, which makes lt robust in double talk scenarios. Apart from jointly cancelling echo and noise, another advantage of this framework is that it does not require a double talk detector. Additionally an expectation maximisation based algorithm is also proposed in this work to estimate linear prediction coefficients of the near end signal. Extensive performance evaluation over the NOIZEUS corpus demonstrates that the proposed framework performs reasonably better than other speech enhancement methods in terms of misalignment of the estimated echo path and perceptual quality of the reconstructed near-end speech signal.
national conference on communications | 2013
Harish Padaki; Karan Nathwani; Rajesh M. Hegde
Clean speech acquisition from distant microphones is often affected by the phenomenon of reverberation. In this paper, the significance of a blind single channel dereverberation method using the linear prediction (LP) residual of the reverberated speech signal is proposed. A relation between the LP residual of the clean and the reverberated speech signal is also derived using the acoustic room impulse response. However in the proposed method, the clean LP residual is computed from its reverberated counterpart by a method of cepstral subtraction. Hence there is no estimation of the acoustic room impulse response (AIR) making the method computationally simple. Experiments on speech dereverberation and distant speech recognition are conducted at various direct to reverberant ratios (DRR). The results are presented using objective measures, subjective measures, and word error rates (WER) and compared to methods available in literature to illustrate the significance of this method.
Signal Processing | 2015
Karan Nathwani; Rajesh M. Hegde
A novel method of joint source separation and dereverberation that minimizes the divergence between the observed and true spectral subband envelopes is discussed in this paper. This divergence minimization is carried out within the non-negative matrix factorization (NMF) framework by imposing certain non-negative constraints on the subband envelopes. Additionally, the joint source separation and dereverberation framework described herein utilizes the spectral subband envelope obtained from group delay spectral magnitude (GDSM). In order to obtain the spectral subband envelope from the GDSM, the equivalence of the magnitude and the group delay spectrum via the weighted cepstrum is used. Since the subband envelope of the group delay spectral magnitude is robust and has a high spectral resolution, less error is noted in the NMF decomposition. Late reverberation components present in the separated signals are then removed using a modified spectral subtraction technique. The quality of separated and dereverberated speech signal is evaluated using several objective and subjective criteria. Experiments on distant speech recognition are then conducted at various direct-to-reverberant ratios (DRR) on the GRID corpus. Experimental results indicate significant improvements over existing methods in the literature. HighlightsA novel method for joint source separation and dereverberation in an NMF framework is proposed.The method uses constrained spectral divergence minimization by imposing non-negative constraints on sub-band envelopes.The group delay spectrum is utilized for source separation and dereverberation.Accurate NMF decompositions are obtained due to the robustness and high spectral resolution of the group delay spectrum.
IEEE Transactions on Multimedia | 2013
Karan Nathwani; Pranav Pandit; Rajesh M. Hegde
A novel method of single channel speaker segregation using the group delay cross correlation function is proposed in this paper. The group delay function, which is the negative derivative of the phase spectrum, yields robust spectral estimates. Hence the group delay spectral estimates are first computed over frequency sub-bands after passing the speech signal through a bank of filters. The filter bank spacing is based on a multi-pitch algorithm that computes the pitch estimates of the competing speakers. An affinity matrix is then computed from the group delay spectral estimates of each frequency sub-band. This affinity matrix represents the correlations of the different sub-bands in the mixed broadband speech signal. The grouping of correlated harmonics present in the mixed speech signal is then carried out by using a new iterative graph cut method. The signals are reconstructed from the respective harmonic groups which represent individual speakers in the mixed speech signal. Spectrographic masks are then applied on the reconstructed signals to refine their perceptual quality. The quality of separated speech is evaluated using several objective and subjective criteria. Experiments on multi-speaker automatic speech recognition are conducted using mixed speech data from the GRID corpus. A cell phone based multimedia information retrieval system (MIRS) for multi-source meeting environments are also developed.
Circuits Systems and Signal Processing | 2015
M. S. Reddy; Karan Nathwani; Rajesh M. Hegde
Acoustic surveillance is gaining importance given the pervasive nature of multimedia sensors being deployed in all environments. In this paper, novel probabilistic detection methods using audio histograms are proposed for acoustic event detection in a multimedia surveillance environment. The proposed detection methods use audio histograms to classify events in a well-defined acoustic space. The proposed methods belong to the category of novelty detection methods, since audio data corresponding to the event is not used in the training process. These methods hence alleviate the problem of collecting large amount of audio data for training statistical models. These methods are also computationally efficient since a conventional audio feature set like the Mel frequency cepstral coefficients in tandem with audio histograms are used to perform acoustic event detection. Experiments on acoustic event detection are conducted on the SUSAS database available from Linguistic data consortium. The performance is measured in terms of false detection rate and true detection rate. Receiver operating characteristics curves are obtained for the proposed probabilistic detection methods to evaluate their performance. The proposed probabilistic detection methods perform significantly better than the acoustic event detection methods available in literature. A cell phone-based alert system for an assisted living environment is also discussed as a future scope of the proposed method. The performance evaluation is presented as number of successful cell phone transactions. The results are motivating enough for the system to be used in practice.
asilomar conference on signals, systems and computers | 2012
Karan Nathwani; Rajesh M. Hegde
In this paper, a method to combine adaptive beamforming microphone arrays with acoustic echo cancellation is proposed. A non reference array framework is used herein where an auxiliary microphone array is used to capture the interfering speech sources, in addition to the primary microphone array that captures the source of interest. An adaptive linearly constrained minimum variance (ALCMV) beamformer developed in the context of multi speech source environment is combined with an acoustic echo canceller by appropriately anchoring the auxiliary microphone array. Experimental results on clean speech acquisition on cell phone like devices indicate reasonable improvement over various beamforming methods.
advances in multimedia | 2012
Arpit Shukla; Karan Nathwani; Rajesh M. Hegde
Distant speech recognition over microphone arrays is challenging, especially in multi source environments. In this paper, a non reference anchor array (NRA) framework for distant speech recognition is proposed. The NRA framework uses a non reference anchor array to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest. The framework uses a linearly constrained minimum variance beam former (LC-MV) beam former such that the signal coming from the look direction is preserved while rejecting correlated interferences coming from the same direction as the source of interest. The performance of the proposed method discussed herein is evaluated by conducting experiments on clean speech acquisition from distant microphones and also on distant speech recognition on the TIMIT and MONC databases. Experimental results obtained from the proposed method indicate a reasonable improvement over correlation, subspace and standard minimum variance beam forming methods.
national conference on communications | 2014
Karan Nathwani; Shubham Khunteta; Piyush Nathwani; Rajesh M. Hegde
In this paper, a novel method of multi channel speech dereverberation using modified generalized sidelobe canceller (MGSC) is proposed. This method utilizes the knowledge of adaptive beamforming (ABF) and linear prediction residual cepstrum (LPRC) technique to perform dereverberation. The LPRC is applied at the beamformed output to generate the signal which still have late reflection components. In addition, the LPRC is also applied at each microphone output to produce reference signals. A relation between the LP residual of the clean and the reverberated speech signal derived in earlier work is used to remove early reflection components from each microphone and beamformer output. This results in effective blocking of the direct path components from the late reflected components. The power envelope (PE) is then obtained for each reference signals and also for the joint ABF and LPRC output. The remaining reverberation components are calculated using these power envelopes. The Post processing is finally used to remove late reverberation components from joint ABF and LPRC output. Experiments on speech dereverberation and distant speech recognition are conducted at various direct to reverberant ratios (DRR) and compared to methods available in literature to illustrate the significance of this method.
asilomar conference on signals, systems and computers | 2014
Subhash C. Tanan; Karan Nathwani; Ayush Jain; Rajesh M. Hegde; Ruchi Rani; Abhijit Tripathy
In this paper a novel method for acoustic echo and noise cancellation in a generalized sidelobe canceler framework is described. The primary contribution of this work is the development of multichannel adaptive Kalman filter (MCAKF) in a modified generalized sidelobe canceler (MGSC) framework. Additionally, in this work both the near end speech signal and noise is assumed to be unknown. In the proposed method speech acquired by a microphone array is subject to adaptive beamforming using MVDR method. On the other hand a blocking matrix filter is used to attenuate the near end speech signal while passing both the noise and residual echo. A MCAKF is developed in this context to also estimate the noise and residual echo. Hence, a difference of MCAKF output and the adaptive beamformer (ABF) output gives an estimate of the near end speech signal. The performance of proposed method is evaluated using subjective and objective measures on the ARCTIC database. Distant speech recognition experiments are also conducted on the ARCTIC database. The proposed method gives reasonable improvements both in terms of perceptual evaluation and distant speech recognition.
asilomar conference on signals, systems and computers | 2013
Karan Nathwani; Harish Padaki; Rajesh M. Hegde
In this work, a method for multi channel speech enhancement using linear prediction (LP) residual cepstrum is proposed. The method performs deconvolution at each microphone output using cepstral domain. The deconvolution of acoustic impulse response from reverberated signal in each individual channel removes early reverberation. This dereverberated output from each channel is then spatially filtered using delay and sum beamformer (DSB). The late reverberation components are then removed by temporal averaging of the glottal closure instants (GCI) computed using the dynamic programming projected phase-slope algorithm (DYPSA). The GCI obtained herein correspond to the LP residual peaks. These residual peaks are excluded from the averaging process, since they have significant impact on speech quality and should remain unmodified. The experiments on subjective and objective evaluation are conducted on TIMIT and MONC databases for proposed method and compared with other methods. The experimental results of proposed method on speech dereverberation and distant speech recognition indicate reasonable improvement over conventional methods.