Hemanta Kumar Palo
Siksha O Anusandhan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hemanta Kumar Palo.
Archive | 2015
Hemanta Kumar Palo; Mihir Narayana Mohanty; Mahesh Chandra
Emotion recognition of human being is one of the major challenges in modern complicated world of political and criminal scenario. In this paper, an attempt is taken to recognise two classes of speech emotions as high arousal like angry and surprise and low arousal like sad and bore. Linear prediction coefficients (LPC), linear prediction cepstral coefficient (LPCC), Mel frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) features are used for emotion recognition using multilayer perception (MLP).Various emotional speech features are extracted from audio channel using above-mentioned features to be used in training and testing. Two hundred utterances from ten subjects were collected based on four emotion categories. One hundred and seventy-five and twenty-five utterances have been considered for training and testing purpose.
Archive | 2015
Hemanta Kumar Palo; Mihir Narayana Mohanty; Mahesh Chandra
Human–computer interaction (HCI) needs to be improved for the field of recognition and detection. Exclusively, the emotion recognition has major impact on social, engineering, and medical science applications. This paper presents an approach for emotion recognition of emotional speech based on neural network. Linear predictive coefficients and radial basis function network are used as features and classification techniques, respectively, for emotion recognition. Results reveal that the approach is effective in recognition of human speech emotions. Speech utterances are directly extracted from audio channel including background noise. Totally, 75 utterances from 05 speakers were collected based on five emotion categories. Fifteen utterances have been considered for training and rest are for test. The proposed approach has been tested and verified for newly developed dataset.
International Journal of Computational Vision and Robotics | 2017
Hemanta Kumar Palo; Mahesh Chandra; Mihir Narayan Mohanty
Emotion recognition of human beings is one of the major challenges in the modern complicated world of political and criminal scenario. In this paper an attempt is taken to recognise two classes of speech emotions as high arousal like angry, surprise and low arousal like sad and bore. Linear prediction coefficients (LPC), Mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) features are used for emotions recognition using multilayer perceptron (MLP) and Gaussian mixture model (GMM) classifier. Two different databases of four emotions, one of five children and other one of a professional actor has been used in this work. Emotion recognition performance of LPC, PLP and MFCC features has been compared with two classifiers, MLP and GMM. MFCC features with MLP classifier and PLP features with GMM classifier has performed best in their respective categories.
Archive | 2016
Rashmirekha Ram; Hemanta Kumar Palo; Mihir Narayan Mohanty; L. Padma Suresh
Human beings have emotions associated with their acts and speeches. The emotional expressions vary with moods and situations. Speech is an important medium through which people express their feelings. Prosodic, spectral, and other parameters of speech vary with the emotions. The ability to represent the emotional speech varies with the type of features chosen. In an attempt to recognize such an emotional content of speech, one of the spectral features (linear prediction coefficients), have been first tested by the fuzzy interference system. Next to it hybridization of LPC features with different prosodic features were compared with LPC features for recognition accuracy. Results show that the hybridization of features can classify emotions better with the FIS system.
international conference on signal processing | 2016
Rashmirekha Ram; Sarthak Panda; Hemanta Kumar Palo; Mihir Narayana Mohanty
The occurrence of noise in almost all types of signals is natural. Though the noise variants are many, the impulsive noise in signal highly affects its quality. In this piece of work, speech signal is considered for enhancement that is contaminated with impulsive noise. Generally, hiccups create such type of noise due to tiredness or myoclonic problem of human subjects. Removal of this type of impulsive noise can enhance the speech signal and can be used in case of recognition, security and in the field of medicine. The popular recursive least mean square (RLS) algorithm has been used for this purpose. Also the state space variant of RLS (SSRLS) application enhances the result and can be used for real time applications. The result shows its performance in terms of signal to noise ratio (SNR) and the visualization of the speech signal.
international conference on next generation computing technologies | 2016
Hemanta Kumar Palo; Mihir Narayan Mohanty; Mahesh Chandra
The objective of this paper is to analyse the sad state of speech emotion using voice quality features. This will help the family members, relatives, well-wishers and medical practitioners for timely action to the needy person before onset of deep depression that may danger his/her life. Fuzzy C-means and K-means clustering algorithm have been used to put a boundary between sad speech state against the neutral utterances using voice quality features such as jitter, shimmer, noise to harmonic ratio (NHR) and harmonic to noise ratio (HNR). Shimmer has shown highest accuracy among all these features for sad state followed by jitter as the result suggest. However, for neutral utterances the accuracy of HNR features is best among all followed by shimmer.
international conference on circuit power and computing technologies | 2016
Rashmirekha Ram; Hemanta Kumar Palo; Mihir Narayan Mohanty
Clarity and intelligibility in speech signal demands removal of noise and interference associated with the signal at the source. This poses further challenge when the speech signal is colored with human emotions. In this work, the authors have taken a novel step to enhance the emotional speech signal adaptively before classification. Most popular adaptive algorithm such as Least mean square (LMS), Normalized least mean squares (NLMS) and Recursive least square (RLS) has been put to test to obtain enhanced speech emotions. Neural network based Multilayer perceptron (MLP) classifier is used to recognize fear speech emotion as against neutral voices using effective Linear Prediction coefficients (LPCs). The accuracy has improved to approximately 77% with enhanced signal. The increased accuracy of this signal has been witnessed with RLS algorithm as against the noisy signal with corresponding algorithm.
Archive | 2019
Rashmirekha Ram; Hemanta Kumar Palo; Mihir Narayan Mohanty
The Fractional Fourier Transform (FrFT) can be interpreted as a rotation in the time-frequency plane with an angle α. It describes the speech signal characteristics as the signal changes from time to frequency domain. However, to locate the fractional Fourier domain frequency contents and multicomponent analysis of nonlinear chirp like signals such as speech the Short-Time FrFT (SFrFT) can provide an improved time-frequency resolution. By representing the time and fractional frequency domain information simultaneously, the SFrFT can filter out cross terms and distortion in a signal adequately for better signal enhancement. The method has experienced with better Signal to Noise Ratio (SNR) and Perceptual Evaluation of Speech Quality (PESQ) under different noisy conditions as compared to the conventional FrFT in our results.
Archive | 2018
Hemanta Kumar Palo; Mahesh Chandra; Mihir Narayan Mohanty
In this chapter, different variants of Mel-frequency cepstral coefficients (MFCCs) describing human speech emotions are investigated. These features are tested and compared for their robustness in terms of classification accuracy and mean square error. Although MFCC is a reliable feature for speech emotion recognition, it does not consider the temporal dynamics between features which is crucial for such analysis. To address this issue, delta MFCC as its first derivative is extracted for comparison. Due to poor performance of MFCC under noisy condition, both MFCC and delta MFCC features are extracted in wavelet domain in the second phase. Time–frequency characterization of emotions using wavelet analysis and energy or amplitude information using MFCC-based features has enhanced the available information. Wavelet-based MFCCs (WMFCCs) and wavelet-based delta MFCCs (WDMFCCs) outperformed standard MFCCs, delta MFCCs, and wavelets in recognition of Berlin speech emotional utterances. Probabilistic neural network (PNN) has been chosen to model the emotions as the classifier is simple to train, much faster, and allows flexible selection of smoothing parameter than other neural network (NN) models. Highest accuracy of 80.79% has been observed with WDMFCCs as compared to 60.97 and 62.76% with MFCCs and wavelets, respectively.
International Journal of Information Retrieval Research (IJIRR) | 2018
Hemanta Kumar Palo; Mihir Narayan Mohanty; Mahesh Chandra
Theshape,length,andsizeofthevocaltractandvocalfoldsvarywiththeageofthehumanbeing. Thevariationmaybeofdifferentageorsicknessorsomeotherconditions.Arguably,thefeatures extractedfromtheutterancesfortherecognitiontaskmaydifferfordifferentagegroup.Itcomplicates further for different emotions. The recognition system demands suitable feature extraction and clusteringtechniquesthatcanseparatetheiremotionalutterances.Psychologists,criminalinvestigators, professionalcounselors,lawenforcementagenciesandahostofothersuchentitiesmayfindsuch analysisuseful.Inthisarticle,theemotionstudyhasbeenevaluatedforthreedifferentagegroups ofpeopleusingthebasicage-dependentfeatureslikepitch,speechrate,andlogenergy.Thefeature setshavebeenclusteredfordifferentagegroupsbyutilizingK-meansandFuzzyc-means(FCM) algorithmfortheboredom,sadness,andangerstates.K-meansalgorithmhasoutperformedtheFCM algorithmintermsofbetterclusteringandlowercomputationtimeastheauthors’resultssuggest. KEywoRDS Clustering Technique, Feature Extraction, Log Energy, Pitch, Speech Emotion Analysis