Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dhananjaya N. Gowda is active.

Publication


Featured researches published by Dhananjaya N. Gowda.


Speech Communication | 2013

Spectro-temporal analysis of speech signals using zero-time windowing and group delay function

Yegnanarayana Bayya; Dhananjaya N. Gowda

Traditional methods for estimating the vocal tract system characteristics typically compute the spectrum using a window size of 20-30ms. The resulting spectrum is the average characteristics of the vocal tract system within the window segment. Also, the effect of pitch harmonics need to be countered in the process of spectrum estimation. In this paper, we propose a new approach for estimating the spectrum using a highly decaying window function. The impulse-like window function used is an approximation to integration operation in the frequency domain, and the operation is referred to as zero-time windowing analogous to the zero-frequency filtering operation in frequency domain. The apparent loss in spectral resolution due to the use of a highly decaying window function is restored by successive differencing in the frequency domain. The spectral resolution is further improved by the use of group delay function which has an additive property on the individual resonances as against the multiplicative nature of the magnitude spectrum. The effectiveness of the proposed approach in estimating the spectrum is evaluated in terms of its robustness to additive noise, and in formant estimation.


Circuits Systems and Signal Processing | 2013

Analysis of Acoustic Events in Speech Signals Using Bessel Series Expansion

Chetana Prakash; Dhananjaya N. Gowda; Suryakanth V. Gangashetty

In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.


international conference on acoustics, speech, and signal processing | 2016

Quasi closed phase analysis of speech signals using time varying weighted linear prediction for accurate formant tracking

Dhananjaya N. Gowda; Manu Airaksinen; Paavo Alku

Recent research on temporally weighted linear prediction shows that quasi closed phase (QCP) analysis of speech signals provides better modeling of the vocal tract and the glottal source. Quasi closed phase analysis gives more weightage on the closed phase of the glottal cycle, at the same time deemphasizing the region around the instant of significant excitation which is often poorly predicted. However, all the traditional analysis techniques including the QCP analysis is performed over short intervals of time. They do not impose any continuity constraints either on the vocal tract system or the glottal source. Such constraints are often imposed at a later stage to either smooth or track the estimated features over time. Time varying linear prediction (TVLP) provides a framework for modeling speech with a long-term continuity constraint imposed on the vocal tract shape. In this paper, we propose a new method for accurate modeling and tracking of the vocal tract resonances by integrating the advantages of a QCP analysis with that of TVLP. Formant tracking experiments show consistent improvement in performance over traditional LP or TVLP methods under a variety of conditions including different voice types and over a wide range of fundamental frequency.


conference of the international speech communication association | 2013

Robust formant detection using group delay function and stabilized weighted linear prediction

Dhananjaya N. Gowda; Jouni Pohjalainen; Mikko Kurimo; Paavo Alku

In this paper, we propose a robust spectral representation using the group delay (GD) function computed from the stabilized weighted linear prediction (SWLP) coefficients. Temporal weighting of the cost function in linear prediction (LP) analysis with the short-term energy of the speech signal improves the robustness of the resultant spectrum. The additive property of the group delay function provides for better representation of weaker resonances in the spectrum, and thereby improving the robustness of the representation. The SWLP provides robustness in the temporal domain, whereas the GD function provides robustness in the frequency domain. The proposed SWLP-GD representation is shown to be robust against different types of additive noise degradations, compared to the popularly used discrete Fourier transform (DFT) or LP based representations. In a small-scale closed-set speaker recognition experiment, the cepstral features derived from the proposed SWLP-GD spectrum perform better than the traditional mel-cepstral features computed from the discrete Fourier transform (DFT) spectrum under conditions of mismatched degradations.


Speech Communication | 2018

Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction

Ville Vestman; Dhananjaya N. Gowda; Sahidullah; Paavo Alku; Tomi Kinnunen

Abstract From the available biometric technologies, automatic speaker recognition is one of the most convenient and accessible ones due to abundance of mobile devices equipped with a microphone, allowing users to be authenticated across multiple environments and devices. Speaker recognition also finds use in forensics and surveillance. Due to the acoustic mismatch induced by varied environments and devices of the same speaker, leading to increased number of identification errors, much of the research focuses on compensating for such technology-induced variations, especially using machine learning at the statistical back-end. Another much less studied but at least as detrimental source of acoustic variation, however, arises from mismatched speaking styles induced by the speaker, leading to a substantial performance drop in recognition accuracy. This is a major problem especially in forensics where perpetrators may purposefully disguise their identity by varying their speaking style. We focus on one of the most commonly used ways of disguising one’s speaker identity, namely, whispering. We approach the problem of normal-whisper acoustic mismatch compensation from the viewpoint of robust feature extraction. Since whispered speech is intelligible, yet a low-intensity signal and therefore prone to extrinsic distortions, we take advantage of robust, long-term speech analysis methods that utilize slow articulatory movements in speech production. In specific, we address the problem using a novel method, frequency-domain linear prediction with time-varying linear prediction (FDLP-TVLP), which is an extension of the 2-dimensional autoregressive (2DAR) model that allows vocal tract filter parameters to be time-varying, rather than piecewise constant as in classic short-term speech analysis. Our speaker recognition experiments on the whisper subset of the CHAINS corpus indicate that when tested in normal-whisper mismatched conditions, the proposed FDLP-TVLP features improve speaker recognition performance by 7–10% over standard MFCC features in relative terms. We further observe that the proposed FDLP-TVLP features perform better than the FDLP and 2DAR methods for whispered speech.


2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Robust spectral representation using group delay function and stabilized weighted linear prediction for additive noise degradations

Dhananjaya N. Gowda; Jouni Pohjalainen; Paavo Alku; Mikko Kurimo

In this paper, we propose a robust spectral representation using the group delay (GD) function computed from the stabilized weighted linear prediction (SWLP) coefficients. Temporal weighting of the cost function in linear prediction (LP) analysis with the short-term energy of the speech signal improves the robustness of the resultant spectrum. The additive property of the group delay function provides for better representation of weaker resonances in the spectrum, and thereby improving the robustness of the representation. The SWLP provides robustness in the temporal domain, whereas the GD function provides robustness in the frequency domain. The proposed SWLP-GD representation is shown to be robust against different types of additive noise degradations, compared to the popularly used discrete Fourier transform (DFT) or LP based representations. In a small-scale closed-set speaker recognition experiment, the cepstral features derived from the proposed SWLP-GD spectrum perform better than the traditional mel-cepstral features computed from the discrete Fourier transform (DFT) spectrum under conditions of mismatched degradations.


Journal of the Acoustical Society of America | 2017

Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation

Dhananjaya N. Gowda; Manu Airaksinen; Paavo Alku

Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.


conference of the international speech communication association | 2016

Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking

Dhananjaya N. Gowda; Paavo Alku

In this paper, we propose a new method for accurate detection, estimation and tracking of formants in speech signals using time-varying quasi-closed phase analysis (TVQCP). The proposed method combines two different methods of analysis namely, the time-varying linear prediction (TVLP) and quasiclosed phase (QCP) analysis. TVLP helps in better tracking of formant frequencies by imposing a time-continuity constraint on the linear prediction (LP) coefficients. QCP analysis, a type of weighted LP (WLP), improves the estimation accuracies of the formant frequencies by using a carefully designed weight function on the error signal that is minimized. The QCP weight function emphasizes the closed-phase region of the glottal cycle, and also weights down the regions around the main excitations. This results in reduced coupling of the subglottal cavity and the excitation source. Experimental results on natural speech signals show that the proposed method performs considerably better than the detect-and-track approach used in popular tools like Wavesurfer or Praat.


IEEE Signal Processing Letters | 2016

Whispered Speech Detection Using Fusion of Group-Delay-Based Subband Modulation Spectrum and Correntropy Features

Jinfang Wang; Yongqiang Shang; Shuangshuang Jiang; Dhananjaya N. Gowda; Ke Lv

In this letter, we propose a novel fusion feature for detection of whispered speech in noisy environment using a group-delay-based instantaneous spectrum analysis. The fusion feature involves two individual components, namely, subband modulation spectrum (SMS)-based features and subband correntropy (SCE) features, both extracted from the instantaneous spectrum. The instantaneous spectrum estimation involves zero-time windowing for improved temporal resolution and group-delay computation for improved spectral resolution, as compared to the traditional discrete-Fourier-transform-based spectrum estimation. The SMS features capture the spectral representation of the subband energy time trajectories, while the SCE features model the fluctuations in the subband energy time trajectories. The SMS captures both the short-term as well as long-term spectral characteristics of whispered speech and is known to provide good separation between speech and noise components. The correntropy features help capture the dynamics of the vocal tract system to discriminate noisy whisper from noise. Whisper speech detection experiments using support vector machine models and the proposed features indicate promising performance under low signal-to-noise conditions.


conference of the international speech communication association | 2015

AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environments

Dhananjaya N. Gowda; Rahim Saeidi; Paavo Alku

Collaboration


Dive into the Dhananjaya N. Gowda's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Ville Vestman

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Mircea Giurgiu

Technical University of Cluj-Napoca

View shared research outputs
Researchain Logo
Decentralizing Knowledge