Anirban Bhowmick
Birla Institute of Technology, Mesra
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anirban Bhowmick.
International Journal of Speech Technology | 2014
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
In the recent years, wavelet transform has been found to be an effective tool for the time–frequency analysis for non-stationary and quasi-stationary signals such as speech signals. In the recent past, wavelet transform has been used as feature extraction in speech recognition applications. Here we propose a wavelet based feature extraction technique that signifies both the periodic and aperiodic information along with sub-band instantaneous frequency of speech signal for robust speech recognition in noisy environment. This technique is based on parallel distributed processing technique inspired by the human speech perception process. This frontend feature processing technique employs equivalent rectangular bandwidth (ERB) filter like wavelet speech feature extraction method called Wavelet ERB Sub-band based Periodicity and Aperiodicity Decomposition (WERB-SPADE), and examines its validity for TIMIT phone recognition task in noisy environments. The speech sound is filtered by 24 band ERB like wavelet filter banks, and then the equal loudness pre-emphasized output of each band is processed through comb filter. Each comb filter is designed individually for each frequency sub-band to decompose the signal into periodic and aperiodic features. Thus it takes the advantage of the robustness shown by periodic features without losing certain important information like formant transition incorporated in aperiodic features. Speech recognition experiments with a standard HMM recognizer under both clean-training and multi-training condition training is conducted. Proposed technique shows more robustness compared to other features especially in noisy condition.
international conference on communication computing security | 2011
Pawan Kumar; Nitika Jakhanwal; Anirban Bhowmick; Mahesh Chandra
A gender classification system is proposed based on pitch, formants and combination of both. Ten Hindi digits database has been prepared for fifty speakers. Each Speaker has spoken each digit ten times. Formants derived from speech samples have been used for gender classification. Gender classification has been also done by using pitch extracted from different methods. Autocorrelation, Cepstrum and Average Magnitude Difference (AMDF) methods have been used for pitch determination from speech samples. Formants in combination with pitch are also used for gender classification. A feature vector consisting of pitches derived from all the above mentioned pitch determination methods was also used for gender classification. Experiments were performed for both open-set and closed-set gender classification. Autocorrelation method performed best for gender classification in open-set. Hybrid method (Autocorrelation +AMDF+ Cepstrum) performed best for gender classification in closed-set.
international conference on devices and communications | 2011
V. K. Gupta; Anirban Bhowmick; Mahesh Chandra; S. N. Sharan
Efficiency of the speech recognition system in noise free environment is impressive but in the presence of environmental noise the efficiency of the speech recognition system deteriorates drastically. Environmental noise also affects human-to-human or human-to-machine communications and degrades the speech quality as well as intelligibility. Here a speech recognition system is proposed in presence of noisy environment. Database of ten Hindi digits was prepared for fifty speakers. Speech and F16 noises were added to clean database to make the noisy database at different Signal-to-Noise Ratio (SNR) levels (-5dB, 0dB, 5dB, 10dB). Spectral estimation techniques like Spectral Subtraction (SS) and Minimum Mean Square Error (MMSE) estimation based methods were used for de-noising the speech before feature extraction. Mel Frequency Cepstral Coefficient (MFCC) and Hidden Markov Model (HMM) were used as feature extraction technique and classifier respectively. Multi-band SS de-noising approach has shown best recognition results as compared to all other techniques for both types of noises.
Iet Signal Processing | 2015
Astik Biswas; Prasanna Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
In recent years, wavelet packet (WP) transform has been used as an important speech representation tool. WP-based acoustic features have found to be more effective than the short-time Fourier transform (STFT)-based features to capture the information of unvoiced phoneme in continuous speech. However, wavelet features fail to carry the same usefulness to represent the voiced phonemes such as vowels, nasals. This paper proposes new WP sub-band-based features by taking care of harmonic information of voiced speech signal. It has been noted that most of the voiced energy of the speech signal lies in between 250 and 2000 Hz. Thus, the proposed technique emphasises the individual sub-band harmonic energy up to 2 kHz. The speech signal is decomposed into 16 wavelet sub-bands and harmonic energy features are combined with WP cepstral (WPCC) features to enhance the performance of voiced phoneme recogniser. A standard phonetically balanced Hindi database is taken to analyse the performance of the proposed feature set. The noisy phoneme recognition task is also carried out to study the robustness. Significant improvement is obtained with the proposed feature set in voiced phoneme recognition over WPCC and conventional Mel frequency cepstral coefficient.
Computers & Electrical Engineering | 2015
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
Display Omitted 24 subband WP decomposition according to the auditory ERB scale.Proposed wavelet subband specific periodic and aperiodic decomposition.Wiener filter is used at frontend for noise minimization.Hindi phoneme classification task has been carried out.Proposed technique outperforms others classify voiced phonemes. Wavelet packet (WP) acoustic features are found to be very promising in unvoiced phoneme classification task but they are less effective to capture periodic information from voiced speech. This motivated us to develop a wavelet packet based feature extraction technique that signifies both the periodic and aperiodic information. This method is based on parallel distributed processing technique inspired by the human speech perception process. This front end feature processing technique employs Equivalent Rectangular Bandwidth (ERB) filter like wavelet speech feature extraction method called Wavelet ERB Sub-band based Periodicity and Aperiodicity Decomposition (WERB-SPADE). Winer filter is used at front end to minimize the noise for further processing. The speech signal is filtered by 24 band ERB like wavelet filter banks, and then the output of each sub-band is processed through comb filter. Each comb filter is designed individually for each sub-band to decompose the signal into periodic and aperiodic features. Thus it carries the periodic information without losing certain important information like formant transition incorporated in aperiodic features. Hindi phoneme classification experiments with a standard HMM recognizer under both clean-training and multi-training condition is conducted. This technique shows significant improvement in voiced phoneme class without affecting the performance of unvoiced phoneme class.
Computers & Electrical Engineering | 2017
Anirban Bhowmick; Mahesh Chandra
Abstract Wavelet-based speech enhancement technique has gained popularity for its inherent nature of noise minimization. Utilizing this advantage, in this work, a new approach is taken to design a speech enhancement system by using Voiced Speech Probability (VSP) based Wavelet Decomposition (WD). The VSP-based improved Voice Activity Detector (VAD) is designed to separate voiced/unvoiced frames on the basis of the probability calculated by the likelihood ratio of two Gaussian Mixture Models (GMMs). A gain estimator is integrated with the WD stage to minimize the Mean Square Error (MSE). This proposed method has been evaluated on a subset of TIMIT database corrupted with five different noise conditions such as machine gun noise, white noise, pink noise, f16 noise, and operations room noise at four Signal to Noise Ratio (SNR) levels. This method is also compared with traditional algorithms, and significant performance improvement is observed.
ieee international conference on recent trends in information systems | 2015
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
This paper presents an audio visual phoneme recognition system using the shape and appearance information extracted from jaw and lip region to enhance the robustness in noisy environment. Consideration of visual features along with traditional acoustic features have been found to be promising in complex auditory environment. Visual modality can provide complementary information to the speech recognizer when the audio modality is badly affected by background noise. Acoustic modality is represented by auditory based equivalent rectangular bandwidth (ERB) like wavelet features (WERBC) features, whereas visual modality is represented by statistically powerful active appearance model (AAM) based features. Audio and visual modalities are fused by using a proportional weighting factor to form the two stream audio visual synchronous Hidden Markov Model (SHMM) recognizer. The VidTIMIT database is chosen to study the performance of multi-modal phoneme recognition system. Artificial noises are injected to audio files at different SNR levels (0dB-20dB) to study the performance of system in noisy environment. Combination of WERBC and AAM features outperform the well known traditional combination of Mel scale cepstrum coefficients (MFCC) acoustic features and discrete cosine transform (DCT) visual features.
ieee india conference | 2014
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
In the recent years Wavelet packet (WP) transform has been used as an important speech representation tool. WP based acoustic features have found to be more effective than short time Fourier transform (STFT) based features to capture the information of unvoiced phoneme in continuous speech. In this paper, a new 24 sub-band Equivalent Rectangular Bandwidth (ERB) like wavelet filter is proposed by employing perceptual wiener filter on each sub-band of decomposed noisy speech. Wiener filtered output is then proceed according to the Johnston model to calculate Auditory masking threshold for each wavelet decomposed sub-band. This threshold is used to design the perceptual sub-band weighting (PSW) filter. The output from each perceptually weighted sub-band is processed further to calculate acoustic front end features. This technique aims to enhance the noisy speech signal by using standard Wiener filter on psychoacoustically motivated decomposed wavelet sub-band by controlling the sub-band weighting factor. Hindi continuous digit database is used to evaluate the performance of the proposed feature. Obtained results show that proposed feature is effective for noisy speech recognition compared to some recently proposed feature extraction techniques.
Iete Journal of Research | 2016
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
ABSTRACT In the recent years, wavelet packet (WP) transform has been used as an important speech representation tool. WP-based acoustic features have found to be more efficient than the short-time Fourier transform (STFT)-based features to capture the information of unvoiced phoneme from continuous speech. In this paper, a new 24 sub-band equivalent rectangular bandwidth (ERB)-like wavelet filter is proposed by employing perceptual Wiener filter on each sub-band of decomposed noisy speech. Wiener filtered output is then proceeded according to the Johnston model to calculate auditory masking threshold for each wavelet decomposed sub-band. This threshold is used to design the perceptual sub-band weighting (PSW) filter. The output from each perceptually weighted sub-band is processed further to calculate acoustic front end features. This technique aims to enhance the noisy speech signal by using standard Wiener filter on psychoacoustically motivated decomposed wavelet sub-band by controlling the sub-band weighting factor. Hindi continuous digit database and TIMIT database is used to evaluate the performance of the proposed feature. Obtained results show that proposed feature is effective for noisy speech recognition compared to some recently proposed feature extraction techniques.
International Journal of Speech Technology | 2017
Anirban Bhowmick; Mahesh Chandra; Astik Biswas
In recent past, wavelet packet (WP) based speech enhancement techniques have been gaining popularity due to their inherent nature of noise minimization. WP based techniques appeared as more robust and efficient than short-time Fourier transform based methods. In the present work, a speech enhancement method using Teager energy operated equal rectangular bandwidth (ERB)-like WP decomposition has been proposed. Twenty four sub-band perceptual wavelet packet decomposition (PWPD) structure is implemented according to the auditory ERB scale. ERB scale based decomposition structure is used because the central frequency of the ERB scale distribution is similar to the frequency response of the human cochlea. Teager energy operator is applied to estimate the threshold value for the PWPD coefficients. Lastly, Wiener filtering is applied to remove the low frequency noise before final reconstruction stage. The proposed method has been applied to evaluate the Hindi sentences database, corrupted with six noise conditions. The proposed method’s performance is analysed with respect to several speech quality parameters and output signal to noise ratio levels. Performance indicates that the proposed technique outperforms some traditional speech enhancement algorithms at all SNR levels.