Is this you? Create Your Porfile

Jagannath H. Nirmal

K. J. Somaiya College of Engineering

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jagannath H. Nirmal is active.

Explore More

Publication

Featured researches published by Jagannath H. Nirmal.

international conference on advances in pattern recognition | 2015

A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network

Khan Suhail Ahmad; Anil Thosar; Jagannath H. Nirmal; Vinay S. Pande

This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test the accuracy of our proposed feature set for different values of frame overlap and MFCC feature vector sizes to identify the system having highest accuracy. Principal component analysis (PCA) is applied before the training and testing stages for feature dimensionality reduction thereby increasing computing speed and puts low constraint on the memory required for processing. The use of probabilistic neural network (PNN) in the modeling domain provided the advantages of achieving lower operational times during the training stages. The experiments examined the percentage identification accuracy (PIA) of MFCC, combination of MFCC and DMFCC as well as combination of all three feature sets MFCC, DMFCC and DDMFCC. The proposed feature set attains an identification accuracy of 94% for frame overlap of 90% and MFCC feature size of 18 coefficients. It outperforms the identification rates of the other two feature sets. These speaker recognition experiments were tested using the Voxforge database.

Applied Soft Computing | 2014

Voice conversion using General Regression Neural Network

Jagannath H. Nirmal; Mukesh A. Zaveri; Suprava Patnaik; Pramod H. Kachare

Graphical abstractDisplay Omitted HighlightsWe model pitch residuals using wavelet packet decomposed coefficients.Thus problem of artifacts generated due to direct transformation by ANN is alleviated.GRNN is proposed to modify vocal tract and wavelet packet decomposed pitch residuals.Mapping using GRNN model perform slightly better than GMM and RBF mapping models.Fast convergence of GRNN reduces computation time and overtraining of conventional ANN. The objective of voice conversion system is to formulate the mapping function which can transform the source speaker characteristics to that of the target speaker. In this paper, we propose the General Regression Neural Network (GRNN) based model for voice conversion. It is a single pass learning network that makes the training procedure fast and comparatively less time consuming. The proposed system uses the shape of the vocal tract, the shape of the glottal pulse (excitation signal) and long term prosodic features to carry out the voice conversion task. In this paper, the shape of the vocal tract and the shape of source excitation of a particular speaker are represented using Line Spectral Frequencies (LSFs) and Linear Prediction (LP) residual respectively. GRNN is used to obtain the mapping function between the source and target speakers. The direct transformation of the time domain residual using Artificial Neural Network (ANN) causes phase change and generates artifacts in consecutive frames. In order to alleviate it, wavelet packet decomposed coefficients are used to characterize the excitation of the speech signal. The long term prosodic parameters namely, pitch contour (intonation) and the energy profile of the test signal are also modified in relation to that of the target (desired) speaker using the baseline method. The relative performances of the proposed model are compared to voice conversion system based on the state of the art RBF and GMM models using objective and subjective evaluation measures. The evaluation measures show that the proposed GRNN based voice conversion system performs slightly better than the state of the art models.

International Scholarly Research Notices | 2014

Complex Cepstrum Based Voice Conversion Using Radial Basis Function

Jagannath H. Nirmal; Suprava Patnaik; Mukesh A. Zaveri; Pramod H. Kachare

The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency () are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.

Eurasip Journal on Audio, Speech, and Music Processing | 2013

A novel voice conversion approach using admissible wavelet packet decomposition

Jagannath H. Nirmal; Mukesh A. Zaveri; Suprava Patnaik; Pramod H. Kachare

The framework of voice conversion system is expected to emphasize both the static and dynamic characteristics of the speech signal. The conventional approaches like Mel frequency cepstrum coefficients and linear predictive coefficients focus on spectral features limited to lower frequency bands. This paper presents a novel wavelet packet filter bank approach to identify non-uniformly distributed dynamic characteristics of the speaker. Contribution of this paper is threefold. First, in the feature extraction stage, dyadic wavelet packet tree structure is optimized to involve less computation while preserving the speaker-specific features. Second, in the feature representation step, magnitude and phase attributes are treated separately to rule out on the fact that raw time-frequency traits are highly correlated but carry intelligent speech information. Finally, the RBF mapping function is established to transform the speaker-specific features from the source to the target speakers. The results obtained by the proposed filter bank-based voice conversion system are compared to the baseline multiscale voice morphing results by using subjective and objective measures. Evaluation results reveal that the proposed method outperforms by incorporating the speaker-specific dynamic characteristics and phase information of the speech signal.

Archive | 2013

Voice Transformation Using Radial Basis Function

Jagannath H. Nirmal; Suparva Patnaik; Mukesh A. Zaveri

This paper presents novel technique of voice transformation (VT), which transform the individual acoustic characteristics of the source speaker so that it is perceived as if spoken like target speaker. Using features namely line spectral pairs (LSP) and pitch as spectral and glottal parameters of the source speaker are transformed into target speaker parameters using radial basis function (RBF). The results are evaluated using subjective and objective measures based on voice quality method. The listening tests prove that the proposed algorithm converts speaker individuality while maintaining high speech quality.

advances in computing and communications | 2014

A comparison of Multi-Layer Perceptron and Radial Basis Function neural network in the voice conversion framework

Ankita N. Chadha; Jagannath H. Nirmal; Mukesh A. Zaveri

The voice conversion system modifies the speaker specific features of the source speaker so that it sounds like a target speaker speech. The voice individuality of the speech signal is characterized at various levels such as shape of the glottal excitation, shape of the vocal tract and the long term prosodic features. In this work, Line Spectral Frequencies (LSF) are used to represent the shape of the vocal tract and Linear Predictive (LP) residual represents the shape of the glottal excitation of a particular speaker. A Multi Layer Perceptron (MLP) and Radial Basis Function (RBF) based neural network are explored to formulate the nonlinear mapping for modifying the LSFs. The baseline residual selection method is used to modify the LP-residual of one speaker to that of another speaker. A relative comparison between MLP and RBF are carried out using various objective and subjective measures for inter-gender and intra-gender voice conversion. The results reveal that an optimized RBF performs slightly better than baseline MLP based voice conversion.

Neural Computing and Applications | 2016

Voice conversion system using salient sub-bands and radial basis function

Jagannath H. Nirmal; Mukesh A. Zaveri; Suprava Patnaik; Pramod H. Kachare

The objective of voice conversion is to replace the speaker-dependent characteristics of the source speaker so that it is perceptually similar to that of the target speaker. The speaker-dependent spectral parameters are characterized using single-scale interpolation techniques such as linear predictive coefficients, formant frequencies, mel cepstrum envelope and line spectral frequencies. These features provide a good approximation of the vocal tract, but produce artifacts at the frame boundaries which result in inaccurate parameter estimation and distortion in re-synthesis of the speech signal. This paper presents a novel approach of voice conversion based on multi-scale wavelet packet transform in the framework of radial basis neural network. The basic idea is to split the signal acoustic space into different salient frequency sub-bands, which are finely tuned to capture the speaker identity, conveyed by the speech signal. Characteristics of different wavelet filters are studied to determine the best filter for the proposed voice conversion system. A relative performance of the proposed algorithm is compared with the state-of-the-art wavelet-based voice morphing using various subjective and objective measures. The results reveal that the proposed algorithm performs better than the conventional wavelet-based voice morphing.

International Journal of Speech Technology | 2018

Multitaper perceptual linear prediction features of voice samples to discriminate healthy persons from early stage Parkinson diseased persons

Savitha S. Upadhya; Alice N. Cheeran; Jagannath H. Nirmal

The performance of multitaper perceptual linear prediction (PLP) features of speech samples to discriminate healthy and early stage Parkinson diseased subjects is investigated in this paper. The PLP features are conventionally obtained by computing the power spectrum using a single tapered Hamming window. This estimated spectrum exhibits large variance which can be reduced by computing the weighted average of power spectra obtained using a set of tapered windows, leading to multitaper spectral estimation. In this investigation, two multitaper techniques namely Sine wave taper and Thomson multitaper along with the conventional single taper windowing are investigated. Artificial Neural network is then used to classify the PLP features extracted by applying the three types of window tapers on the speech signals of healthy and early stage Parkinson affected people and their respective performances are compared. The results show more accuracy using the multitaper techniques when compared with the conventional single taper technique. It is seen that the accuracy obtained using Sine wave tapers as well as Thomson multitaper is maximum for five tapers. An improvement in the recognition accuracy by 7.5% using the Sine tapers and by 6.9% using the Thomson tapers is obtained when compared with the conventional method. An improvement in other performance measures like Equal error rate, False positive rate, False negative rate, Sensitivity and Specificity is also observed in the multitaper techniques.

Neurocomputing | 2017

Novel approach of MFCC based alignment and WD-residual modification for voice conversion using RBF

Jagannath H. Nirmal; Mukesh A. Zaveri; Suprava Patnaik; Pramod H. Kachare

The voice conversion system modifies the speaker specific characteristics of the source speaker to that of the target speaker, so it perceives like target speaker. The speaker specific characteristics of the speech signal are reflected at different levels such as the shape of the vocal tract, shape of the glottal excitation and long term prosody. The shape of the vocal tract is represented by Line Spectral Frequency (LSF) and the shape of glottal excitation by Linear Predictive (LP) residuals. In this paper, the fourth level wavelet packet transform is applied to LP-residual to generate the sixteen sub-bands. This approach not only reduces the computational complexity but also presents a genuine transformation model over state of the art statistical prediction methods. In voice conversion, the alignment is an essential process which aligns the features of the source and target speakers. In this paper, the Mel Frequency Cepstrum Coefficients (MFCC) based warping path is proposed to align the LSF and LP-residual sub-bands using proposed constant source and constant target alignment. The conventional alignment technique is compared with two proposed approaches namely, constant source and constant target. Analysis shows that, constant source alignment using MFCC warping path performs slightly better than the constant target alignment and the state-of-the-art alignment approach. Generalized mapping models are developed for each sub-band using Radial Basis Function neural network (RBF) and are compared with Gaussian Mixture mapping model (GMM) and residual selection approach. Various subjective and objective evaluation measures indicate significant performance of RBF based residual mapping approach over the state-of-the-art approaches. HighlightsThe LSF fails to represent formant valleys but good for formant peaks. Hence, calculated warping path is not satisfactory to yield a better alignment.This LSF based warping overcome through a new alignment using MFCC based warping path, which improves the conversion performance of proposed system.Further, the existing techniques for mapping the LP-residual suffer from issues of artifacts generated in consecutive frames. The residual signal is also quite complex to map.In order to solve the high dimensionality issue of residual signal is reduced and the complexity of the model is decreased, the WPT and RBF pairs are employed.The experimental results prove that the proposed MFCC based warping path and the WPT-RBF based transformation for residual signal outperforms the state of the art methods of residual selection and GMM model.

Biomedical Signal Processing and Control | 2018

Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease

Savitha S. Upadhya; Alice N. Cheeran; Jagannath H. Nirmal

Abstract In this paper, MFCC and PLP voice features extracted using Single Taper Smooth (STS) window and Thomson Multitaper (TMT) windowing technique together with a neural network classifier is used in the classification of Healthy people from early stage Parkinson diseased patients and a performance comparison of the two techniques is reported. Parkinson disease in their early stages, not only affects the muscular movements of the human body but also influences the articulatory process of the speech production mechanism. This signifies change in the shape of the vocal tract which manifests itself in the short time power spectrum. The MFCC and PLP features used in this investigation, which represent the vocal tract parameters are derived from the short time spectrum. It is therefore crucial to estimate this short time power spectrum accurately. Generally, the short time speech power spectrum is estimated using STS window. But this power spectrum computed manifests large variance in the spectral estimates. Hence a variance reduced power spectrum is attained by computing the weighted average of the short time speech spectra obtained using a set of TMT windows. This spectrum is then used to compute the PLP and MFCC features. In this paper, extraction of both these voice features using STS window as well as TMT technique with three different weights namely Uniform, Eigen value (EV) and Adaptive weights is implemented using the speech samples of healthy and Parkinson diseased individuals. The experiment was carried out for several Thomson tapers ranging from 1 to 12 and the optimal number of tapers needed for the application and dataset is reported. A comparative performance analysis of the techniques implemented using both MFCC and PLP as features is then carried out in terms of classification accuracy, Equal Error Rate, sensitivity, selectivity and F1 score for the optimal taper value. The results obtained show that in comparison with the STS window a maximum improvement in the classification accuracy was obtained to be 6.6% for nine tapers, adaptive weights using MFCC as features and 6.9% for five tapers, EV weights using PLP as features for experimental dataset 1 and 6.0% using MFCC and 6.4% using PLP for experimental dataset 2. A performance improvement in other measures for the optimal taper value is also observed and reported for experimental dataset 1.

Explore More