Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dushyant Sharma is active.

Publication


Featured researches published by Dushyant Sharma.


international conference on acoustics, speech, and signal processing | 2014

NON-INTRUSIVE ESTIMATION OF THE LEVEL OF REVERBERATION IN SPEECH

Pablo Peso Parada; Dushyant Sharma; Patrick A. Naylor

We show corroborating evidence that, among a set of common acoustic parameters, the clarity index C50 provides a measure of reverberation that is well correlated with speech recognition accuracy. We also present a data driven method for non-intrusive C50 parameter estimation from a single channel speech signal. The method extracts a number of features from the speech signal and uses a binary regression tree, trained on appropriate training data, to estimate the C50. Evaluation is carried out using speech utterances convolved with real and simulated room impulse responses, and additive babble noise. The new method outperforms a baseline approach in our evaluation.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

A single-channel non-intrusive C50 estimator correlated with speech recognition performance

Pablo Peso Parada; Dushyant Sharma; Jose Lainez; Daniel A. Barreda; Toon van Waterschoot; Patrick A. Naylor

Several intrusive measures of reverberation can be computed from measured and simulated room impulse responses, over the full frequency band or for each individual mel-frequency subband. It is initially shown that full-band clarity index C50 is the most correlated measure on average with reverberant speech recognition performance. This corroborates previous findings but now for the dataset to be used in this study. We extend the previous findings to show that C50 also exhibits the highest mutual information on average. Motivated by these extended findings, a nonintrusive room acoustic (NIRA) estimation method is proposed to estimate C50 from only the reverberant speech signal. The NIRA method is a data-driven approach based on computing a number of features from the speech signal and it employs these features to train a model used to perform the estimation. The choice of features and learning techniques are explored in this work using an evaluation set which comprises approximately 100 000 different reverberant signals (around 93 h of speech) including reverberation from measured and simulated room impulse responses. The feature importance of each feature with respect to the estimation of the target C50 is analysed following two different approaches. In both cases, the newly chosen set of features shows high importance for the target. The best C50 estimator provides a root-mean-square deviation around 3 dB on average for all reverberant test environments.


EURASIP Journal on Advances in Signal Processing | 2015

Reverberant speech recognition exploiting clarity index estimation

Pablo Peso Parada; Dushyant Sharma; Patrick A. Naylor; Toon van Waterschoot

We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C50). Our best performing method includes the estimated value of C50 in the ASR feature vector and also uses C50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.


Speech Communication | 2016

A data-driven non-intrusive measure of speech quality and intelligibility

Dushyant Sharma; Yu Wang; Patrick A. Naylor; Mike Brookes

Speech signals are often affected by additive noise and distortion which can degrade the perceived quality and intelligibility of the signal. We present a new measure, NISA, for estimating the quality and intelligibility of speech degraded by additive noise and distortions associated with telecommunications networks, based on a data driven framework of feature extraction and tree based regression. The new measure is non-intrusive, operating on the degraded signal alone without the need for a reference signal. This makes the measure applicable to practical speech processing applications operating in the single-ended mode. The new measure has been evaluated against the intrusive measures PESQ and STOI. The results indicate that the accuracy of the new non-intrusive method is around 90% of the accuracy of the intrusive measures, depending on the test scenario. The NISA measure therefore provides non-intrusive (single-ended) PESQ and STOI estimates with high accuracy.


international conference on acoustics, speech, and signal processing | 2015

Speaker change detection and speaker diarization using spatial information

Mathieu Hu; Dushyant Sharma; Simon Doclo; Mike Brookes; Patrick A. Naylor

In this paper, we present a novel speaker change detection and speaker diarization algorithm using spatial information in the form of features derived from estimated Room Impulse Response (RIR)s. A blind system identification approach is used to obtain an estimate of the RIRs, from which the C5 feature is derived and used in the labeling algorithm. Experimental results using 2 speakers for different locations within a fixed room show that our approach achieves a higher hit rate in the speaker change detection task and a lower variance in the diarization error rate when compared with a baseline algorithm.


workshop on applications of signal processing to audio and acoustics | 2015

Single-channel speaker diarization based on spatial features

Mathieu Hu; Pablo Peso Parada; Dushyant Sharma; Simon Doclo; Toon van Waterschoot; Mike Brookes; Patrick A. Naylor

Speaker diarization has gained much importance over the past five years in helping overcome key challenges faced by automatic meeting transcription systems. Current state-of-the-art algorithms can only utilize spatial information when multi-microphone recordings are available. In this paper, we propose the novel use of reverberation as a source of spatial information obtained from single-channel recordings to perform speaker diarization. The proposed system is shown to reduce speaker classification errors by 34% when compared with current MFCC based single-channel systems.


international workshop on acoustic signal enhancement | 2014

A quantitative comparison of blind C 50 estimators

Pablo Peso Parada; Dushyant Sharma; Jose Lainez; Daniel A. Barreda; Patrick A. Naylor; T. van Waterschoot

The problem of blind estimation of the room acoustic clarity index C50 from single-channel reverberant speech signals is presented in this paper. We analyze the performance of several machine learning methods for a regression task using 309 features derived from the speech signal and modeled with a Deep Belief Network (DBN), Classification And Regression Tree (CART) and Linear Regression (LR). These techniques are evaluated on a large test database (86 hours) that includes babble noise and reverberation using both artificial and real room impulses responses (RIRs). All methods are trained on a database which contains noise, speech and simulated RIRs different from the test set. The performance results show that the DBN model gives the lowest error for the simulated RIRs whereas the LR model gives the best generalization performance with the highest accuracy for real RIRs.


european signal processing conference | 2015

The SAS project: Speech signal processing in high school education

Dushyant Sharma; Alankrit Poddar; Sumanna Manna; Patrick A. Naylor

We describe the Speech And Sound (SAS) outreach project with the aim of introducing high school students to speech signal processing through the real-life example of automatic speech recognition. The syllabus was designed to help students understand how the concepts they learn as part of the physics, mathematics and computing courses relate to reallife applications. The six week project was organized into a mixture of informal lecture and practical sessions and the students were encouraged to engage in informal discussions with the instructors with any questions and ideas. The project was piloted at an international high school in India with 10th, 11th and 12th grade students. By the end of the course, the students had gained a high level understanding of the many technologies that make up such a complex system, as evident by the high overall scores in the final assessment.


european signal processing conference | 2015

Noise robust blind system identification algorithms based on a Rayleigh quotient cost function

Mathieu Hu; Simon Doclo; Dushyant Sharma; Mike Brookes; Patrick A. Naylor

An important prerequisite for acoustic multi-channel equalization for speech dereverberation involves the identification of the acoustic channels between the source and the microphones. Blind System Identification (BSI) algorithms based on cross-relation error minimization are known to mis-converge in the presence of noise. Although algorithms have been proposed in the literature to improve robustness to noise, the estimated room impulse responses are usually constrained to have a flat magnitude spectrum. In this paper, noise robust algorithms based on a Rayleigh quotient cost function are proposed. Unlike the traditional algorithms, the estimated impulse responses are not always forced to have unit norm. Experimental results using simulated room impulse responses and several SNRs show that one of the proposed algorithms outperforms competing algorithms in terms of normalized projection misalignment.


ieee global conference on signal and information processing | 2014

A non-intrusive PESQ measure

Dushyant Sharma; Lisa Meredith; Jose Lainez; Daniel A. Barreda; Patrick A. Naylor

We present NISQ, a data-driven non-intrusive speech quality measure that has been trained to predict the PESQ score for a given speech signal. NISQ is based on feature extraction and a binary tree regression based model. A training method using the intrusive PESQ algorithm to automatically label large quantities of speech data is presented and utilized. Our method is shown to predict PESQ with an RMS error of 0.49 on our test database.

Collaboration


Dive into the Dushyant Sharma's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mike Brookes

Imperial College London

View shared research outputs
Top Co-Authors

Avatar

Toon van Waterschoot

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mathieu Hu

Imperial College London

View shared research outputs
Top Co-Authors

Avatar

Simon Doclo

University of Oldenburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nikolay D. Gaubitch

Delft University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge