Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Suman Deb is active.

Publication


Featured researches published by Suman Deb.


national conference on communications | 2015

A novel breathiness feature for analysis and classification of speech under stress

Suman Deb; S. Dandapat

This work explores the effect of breathiness component on speech under stress. The breathiness component in a speech signal can be estimated using different features such as period perturbation quotient (PPQ), amplitude perturbation quotient (APQ), harmonic to noise ratio (HNR), glottal to noise excitation ratio (GNER), harmonic energy (HE), harmonic energy of residue (HER) and harmonic to signal ratio (HSR). Statistical analysis of these features shows that they have different mean and variance values for speech under stress. The performance of breathiness features is evaluated using Hidden Markov Model (HMM) for classification of speech under stress. The results show that the breathiness features successfully characterize the speech under stress. The performance of breathiness features is compared with the MFCC feature. Finally, a speech under stress classification method is proposed with the combination of breathiness and MFCC features. In terms of classification rates, the proposed combined feature outperforms the MFCC feature.


IEEE Transactions on Affective Computing | 2017

Emotion Classification using Segmentation of Vowel-Like and Non-Vowel-Like Regions

Suman Deb; S. Dandapat

In this work, a novel region switching based classification method is proposed for speech emotion classification using vowel-like regions (VLRs) and non-vowel-like regions (non-VLRs). In literature, normally the entire active speech region is processed for emotion classification. A few studies have been performed on segmented sound units, such as, syllables, phones, vowel, consonant and voiced, for speech emotion classification. This work presents a detailed analysis of emotion information contained independently in segmented VLRs and non-VLRs. The proposed region switching based method is implemented by choosing the features of either VLRs or non-VLRs for each emotion. The VLRs are detected by identifying hypothesized VLR onset and end points. Segmentation of non-VLRs is done by using the knowledge of VLRs and active speech regions. The performance is evaluated using EMODB, IEMOCAP and FAU AIBO databases. Experimental results show that both the VLRs and non-VLRs contain emotion-specific information. In terms of emotion classification rate, the proposed region switching based classification approach shows significant improvement in comparison to the classification approach by processing entire active speech region, and it outperforms other state-of-the-art approaches for all the three databases.


Computers & Electrical Engineering | 2016

Classification of speech under stress using harmonic peak to energy ratio

Suman Deb; S. Dandapat

A new feature, harmonic peak to energy ratio (HPER), is proposed for analysis and classification of speech under stress.Significance of the HPER feature is explored using statistical analysis.Binary-cascade multi-class classification strategy is used based on the valence-activation descriptors of different stress conditions.The performance is evaluated using Support Vector Machine (SVM) classifier. Display Omitted This paper explores the analysis and classification of speech under stress using a new feature, harmonic peak to energy ratio (HPER). The HPER feature is computed from the Fourier spectra of speech signal. The harmonic amplitudes are closely related to breathiness levels of speech. These breathiness levels may be different for different stress conditions. The statistical analysis shows that the proposed HPER feature is useful in characterization of various stress classes. Support Vector Machine (SVM) classifier with binary cascade strategy is used to evaluate the performance of the HPER feature using simulated stressed speech database (SSD). The performance results show that the HPER feature successfully characterizes different stress conditions. The performance of the HPER feature is compared with the mel frequency cepstral coefficients (MFCC), the Linear prediction coefficients (LPC) and the Teager-Energy-Operator (TEO) based Critical Band TEO Autocorrelation Envelope (TEO-CB-Auto-Env) features. The proposed HPER feature outperforms the MFCC, LPC and TEO-CB-Auto-Env features. The combination of the HPER feature with the MFCC feature further increases the system performance.


IEEE Transactions on Systems, Man, and Cybernetics | 2018

Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification

Suman Deb; S. Dandapat

In this paper, a novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal. MRA decomposes the speech signal into number of sub-band signals. The proposed feature is computed by using sinusoidal model on each sub-band signal. Different emotions have different impacts on the vocal tract. As a result, vocal tract responds in a unique way for each emotion. The vocal tract information is enhanced using pre-emphasis. Therefore, emotion information manifested in the vocal tract can be well exploited. This may help in improving the performance of emotion classification. Emotion recognition is performed using German emotional EMODB database, interactive emotional dyadic motion capture database, simulated stressed speech database, and FAU AIBO database with speech signal and speech with enhanced vocal tract information (SEVTI). The performance of the proposed multiscale amplitude feature is compared with three different types of features: 1) the mel frequency cepstral coefficients; 2) the Teager energy operator (TEO)-based feature (TEO-CB-Auto-Env); and 3) the breathinesss feature. The proposed feature outperforms the other features. In terms of recognition rates, the features derived from the SEVTI signal, give better performance compared to the features derived from the speech signal. Combination of the features with SEVTI signal shows average recognition rate of 86.7% using EMODB database.


Speech Communication | 2017

Fourier model based features for analysis and classification of out-of-breath speech

Suman Deb; S. Dandapat

Abstract This paper presents a new method of feature extraction using Fourier model for analysis of out-of-breath speech. The proposed feature is evaluated using mutual information (MI) on the difference and ratio values of the Fourier parameters, amplitude and frequency. The difference and ratio are calculated between two contiguous values of the Fourier parameters. To analyze the out-of-breath speech, a new stressed speech database, named out-of-breath speech (OBS) database, is created. The database contains three classes of speech, out-of-breath speech, low out-of-breath speech and normal speech. The effectiveness of the proposed features is evaluated with the statistical analysis. The proposed features not only differentiate the normal speech and the out-of-breath speech, but also can discriminate different breath emission levels of speech. Hidden Markov model (HMM) and support vector machine (SVM) are used to evaluate the performance of the proposed features using the OBS database. For multi-class classification problem, SVM classifier is used with binary cascade approach. The performance of the proposed features is compared with the breathiness feature, the mel frequency cepstral coefficient (MFCC) feature and the Teager energy operator (TEO) based critical band TEO autocorrelation envelope (TEO-CB-Auto-Env) feature. The proposed feature outperforms the breathiness feature, the MFCC feature and the TEO-CB-Auto-Env feature.


Healthcare technology letters | 2017

Analysis of Physiological Signals using State Space Correlation Entropy

R. K. Tripathy; Suman Deb; S. Dandapat

In this letter, the authors propose a new entropy measure for analysis of time series. This measure is termed as the state space correlation entropy (SSCE). The state space reconstruction is used to evaluate the embedding vectors of a time series. The SSCE is computed from the probability of the correlations of the embedding vectors. The performance of SSCE measure is evaluated using both synthetic and real valued signals. The experimental results reveal that, the proposed SSCE measure along with SVM classifier have sensitivity value of 91.60%, which is higher than the performance of both sample entropy and permutation entropy features for detection of shockable ventricular arrhythmia.


international conference on signal processing | 2016

Emotion classification using residual sinusoidal peak amplitude

Suman Deb; S. Dandapat

In this work, a new feature, residual sinusoidal peak amplitude (RSPA), is proposed for emotion classification. The RSPA feature is evaluated from the LP residual of the speech signal using sinusoidal model. Residual signal is a major source of the excitation and it is expected that emotional information can be well manifested in the residual signal. The effectiveness of the proposed feature is explored using statistical analysis. Support Vector Machine (SVM) classifier is used for performance analysis of the RSPA feature using EMO-DB database. The recognition results show that the proposed RSPA feature outperforms the linear prediction coefficients (LPCs) and TEO-CB-Auto-Env features. Combination of the RSPA and mel frequency cepstral coefficients (MFCC) further improves the recognition performance of the emotion classification.


Archive | 2019

Analysis of Breathy, Emergency and Pathological Stress Classes

Amit Abhishek; Suman Deb; S. Dandapat

Recently, man–machine interaction based on speech recognition has taken an increasing interest in the field of speech processing. The need for machine to understand the human stress levels in a speaker-independent manner, to prioritize the situation, has grown rapidly. A number of databases have been used for stressed speech recognition. Majority of the databases contain styled emotions and Lombard speech. No studies have been reported on stressed speech considering other stress conditions like emergency, breathy, workload, sleep deprivation and pathological condition. In this work, a new stressed speech database is recorded by considering emergency, breathy and pathological conditions. The database is validated with statistical analysis using two features, mel-frequency cepstral coefficient (MFCC) and Fourier parameter (FP). The results show that these recorded stress classes are effectively characterized by the features. A fivefold cross-validation is carried out to assess how the statistical analysis results are independent of the dataset. Support vector machine (SVM) is used to classify different stress classes.


Archive | 2019

Experimental Analysis on Effect of Nasal Tract on Nasalised Vowels

Debasish Jyotishi; Suman Deb; Amit Abhishek; S. Dandapat

In almost every language across the globe nasalised speech is present. Our work is motivated by the fact that nasalised speech detection can improve the speech recognition system. So, to analyse the nasalised speech better, we have designed a device to separate nasal murmur from oral speech, when nasalised speech is spoken. Speech data of different speakers are collected and analysed. Nasalised vowels are analysed first and it has been found that an additional formant is consistently being introduced between 1000 and 1500 Hz. Using various signal processing techniques we analysed different nasalised vowels and found that nasal murmur produced, is invariant irrespective of the nasalised vowels and so is the nasal tract. Nasalisation is being produced in speech by coupling of nasal tract with oral tract. So, when effect of coupling is analysed experimentally, it came out to be addition.


national conference on communications | 2017

Exploration of Phase Information for Speech Emotion Classification

Suman Deb; S. Dandapat

This paper explores the significance of phase information for speech emotion classification. The phase information is extracted from the discrete Fourier transform (DFT) spectrum. The phase of the pitch harmonic is used as a proposed feature for speech emotion classification. Pitch frequency varies with emotions, and due to this pitch harmonic also varies with different emotions. It is expected that the phase of the pitch harmonic contains emotion information. Significance of the harmonic phase is carried out by evaluating the mean and variance values for speech emotion classification. Support Vector Machine (SVM) classifier is used to evaluate the performance of the proposed feature. The performance is evaluated using EMODB database. The performance of the proposed feature is compared with the the linear prediction coefficients (LPC), mel frequency cepstral coefficients (MFCC) and Teager energy operator (TEO) based non-linear critical band TEO autocorrelation envelope (TEO-CB-Auto-Env) features. An average recognition rate of 73.9% is achieved with the combination of the MFCC and proposed features.

Collaboration


Dive into the Suman Deb's collaboration.

Top Co-Authors

Avatar

S. Dandapat

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Amit Abhishek

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Debasish Jyotishi

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

R. K. Tripathy

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge