Astik Biswas
National Institute of Technology, Rourkela
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Astik Biswas.
Computers & Electrical Engineering | 2014
Astik Biswas; Prakash Kumar Sahu; Mahesh Chandra
Graphical abstractDisplay Omitted 24 Subband WP decomposition according to the auditory ERB scale.Variance feature added to make features more robust.Hindi consonants recognition task has been carried out.Special attention given to Hindi unvoiced phonemes specially stops.Improved performance with proposed feature. It was observed that for non-stationary and quasi-stationary signals, wavelet transform has been found to be an effective tool for the time-frequency analysis. In the recent years wavelet transform being used for feature extraction in speech recognition applications. Here a new filter structure using admissible wavelet packet analysis is proposed for Hindi phoneme recognition. These filters have the benefit of having frequency bands spacing similar to the auditory Equivalent Rectangular Bandwidth (ERB) scale whose central frequencies are equally distributed along the frequency response of human cochlea. The phoneme recognition performance of proposed feature is compared with the standard baseline features and 24-band admissible wavelet packet-based features using a Hidden Markov Model (HMM) based classifier. Proposed feature shows better performance compared to conventional features for Hindi consonant recognition. To evaluate the robustness of proposed feature in the noisy environment NOISEX-92 database has been used.
International Journal of Speech Technology | 2014
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
In the recent years, wavelet transform has been found to be an effective tool for the time–frequency analysis for non-stationary and quasi-stationary signals such as speech signals. In the recent past, wavelet transform has been used as feature extraction in speech recognition applications. Here we propose a wavelet based feature extraction technique that signifies both the periodic and aperiodic information along with sub-band instantaneous frequency of speech signal for robust speech recognition in noisy environment. This technique is based on parallel distributed processing technique inspired by the human speech perception process. This frontend feature processing technique employs equivalent rectangular bandwidth (ERB) filter like wavelet speech feature extraction method called Wavelet ERB Sub-band based Periodicity and Aperiodicity Decomposition (WERB-SPADE), and examines its validity for TIMIT phone recognition task in noisy environments. The speech sound is filtered by 24 band ERB like wavelet filter banks, and then the equal loudness pre-emphasized output of each band is processed through comb filter. Each comb filter is designed individually for each frequency sub-band to decompose the signal into periodic and aperiodic features. Thus it takes the advantage of the robustness shown by periodic features without losing certain important information like formant transition incorporated in aperiodic features. Speech recognition experiments with a standard HMM recognizer under both clean-training and multi-training condition training is conducted. Proposed technique shows more robustness compared to other features especially in noisy condition.
Iet Signal Processing | 2015
Astik Biswas; Prasanna Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
In recent years, wavelet packet (WP) transform has been used as an important speech representation tool. WP-based acoustic features have found to be more effective than the short-time Fourier transform (STFT)-based features to capture the information of unvoiced phoneme in continuous speech. However, wavelet features fail to carry the same usefulness to represent the voiced phonemes such as vowels, nasals. This paper proposes new WP sub-band-based features by taking care of harmonic information of voiced speech signal. It has been noted that most of the voiced energy of the speech signal lies in between 250 and 2000 Hz. Thus, the proposed technique emphasises the individual sub-band harmonic energy up to 2 kHz. The speech signal is decomposed into 16 wavelet sub-bands and harmonic energy features are combined with WP cepstral (WPCC) features to enhance the performance of voiced phoneme recogniser. A standard phonetically balanced Hindi database is taken to analyse the performance of the proposed feature set. The noisy phoneme recognition task is also carried out to study the robustness. Significant improvement is obtained with the proposed feature set in voiced phoneme recognition over WPCC and conventional Mel frequency cepstral coefficient.
Computers & Electrical Engineering | 2015
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
Display Omitted 24 subband WP decomposition according to the auditory ERB scale.Proposed wavelet subband specific periodic and aperiodic decomposition.Wiener filter is used at frontend for noise minimization.Hindi phoneme classification task has been carried out.Proposed technique outperforms others classify voiced phonemes. Wavelet packet (WP) acoustic features are found to be very promising in unvoiced phoneme classification task but they are less effective to capture periodic information from voiced speech. This motivated us to develop a wavelet packet based feature extraction technique that signifies both the periodic and aperiodic information. This method is based on parallel distributed processing technique inspired by the human speech perception process. This front end feature processing technique employs Equivalent Rectangular Bandwidth (ERB) filter like wavelet speech feature extraction method called Wavelet ERB Sub-band based Periodicity and Aperiodicity Decomposition (WERB-SPADE). Winer filter is used at front end to minimize the noise for further processing. The speech signal is filtered by 24 band ERB like wavelet filter banks, and then the output of each sub-band is processed through comb filter. Each comb filter is designed individually for each sub-band to decompose the signal into periodic and aperiodic features. Thus it carries the periodic information without losing certain important information like formant transition incorporated in aperiodic features. Hindi phoneme classification experiments with a standard HMM recognizer under both clean-training and multi-training condition is conducted. This technique shows significant improvement in voiced phoneme class without affecting the performance of unvoiced phoneme class.
International Journal of Speech Technology | 2016
Astik Biswas; Prakash Kumar Sahu; Mahesh Chandra
Consideration of visual speech features along with traditional acoustic features have shown decent performance in uncontrolled auditory environment. However, most of the existing audio-visual speech recognition (AVSR) systems have been developed in the laboratory conditions and rarely addressed the visual domain problems. This paper presents an active appearance model (AAM) based multiple-camera AVSR experiment. The shape and appearance information are extracted from jaw and lip region to enhance the performance in vehicle environments. At first, a series of visual speech recognition (VSR) experiments are carried out to study the impact of each camera on multi-stream VSR. Four cameras in car audio-visual corpus is used to perform the experiments. The individual camera stream is fused to have four-stream synchronous hidden Markov model visual speech recognizer. Finally, optimum four-stream VSR is combined with single stream acoustic HMM to build five-stream AVSR. The dual modality AVSR system shows more robustness compared to acoustic speech recognizer across all driving conditions.
International Journal of Signal and Imaging Systems Engineering | 2013
A. N. Mishra; Mahesh Chandra; Astik Biswas; S. N. Sharan
Automatic Speech Recognition (ASR) system performs well under restricted conditions but the performance degrades under noisy environment. Audio–visual features play an important role in ASR systems in the presence of noise. In this paper, Hindi phoneme recognition system is designed using audio-visual features. The Discrete Cosine Transform (DCT) features of the lip region integrated with Mel Frequency Cepstral Coefficient (MFCC) audio features are used to get better recognition performance under noisy environments. Colour intensity, hybrid method and Pseudo-Hue methods have been used for lip-localisation approach with Linear Discriminant Analyser (LDA) as a classifier. Recognition performance using Pseudo-Hue method proved best among all the methods.
Iet Signal Processing | 2016
Astik Biswas; Prakash Kumar Sahu; Mahesh Chandra
Nowadays, wavelet packet (WP) based features have been used extensively to maximise the performance of automatic speech recognition in the complex auditory environment. However, wavelet features are less sufficient to represent the voiced speech. Recent researches on WP technique seek for complementary voicing information to overcome this problem. However, considering additional voicing features results in longer dimension and somehow affected the performance for unvoiced speech. This study presents a new analysis of variance technique to incorporate voicing information on WP sub-band based features without affecting its performance and dimension. It has been noticed that most of the voiced energy lies below 2 kHz. Thus, the proposed technique emphasises the lower sub-bands for additional voicing information. Harmonic energy features are combined with recently introduced auditory motivated equivalent rectangular bandwidth like 24-band WP cepstral features to enhance the performance of voiced phoneme recogniser. Primarily, a standard phonetically balanced Hindi database is used to analyse the performance of the proposed technique across a wide range of signal-to-noise ratios. Proposed features show a promising result in phoneme recognition experiment without affecting the feature dimension and performance.
ieee international conference on recent trends in information systems | 2015
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
This paper presents an audio visual phoneme recognition system using the shape and appearance information extracted from jaw and lip region to enhance the robustness in noisy environment. Consideration of visual features along with traditional acoustic features have been found to be promising in complex auditory environment. Visual modality can provide complementary information to the speech recognizer when the audio modality is badly affected by background noise. Acoustic modality is represented by auditory based equivalent rectangular bandwidth (ERB) like wavelet features (WERBC) features, whereas visual modality is represented by statistically powerful active appearance model (AAM) based features. Audio and visual modalities are fused by using a proportional weighting factor to form the two stream audio visual synchronous Hidden Markov Model (SHMM) recognizer. The VidTIMIT database is chosen to study the performance of multi-modal phoneme recognition system. Artificial noises are injected to audio files at different SNR levels (0dB-20dB) to study the performance of system in noisy environment. Combination of WERBC and AAM features outperform the well known traditional combination of Mel scale cepstrum coefficients (MFCC) acoustic features and discrete cosine transform (DCT) visual features.
International Journal of Computational Vision and Robotics | 2015
Astik Biswas; Prakash Kumar Sahu; Mahesh Chandra
Automatic speech recognition ASR system performs well under restricted conditions but the performance degrades under noisy environment. Audio-visual features play an important role in ASR systems in presence of noise. In this paper, Hindi isolated digits recognition system is designed using audio visual features. The visual features of the lip region integrated with audio features to get better recognition performance under noisy environments. Colour intensity and pseudo hue methods have been used for lip localisation approach with hidden Markov model HMM as a classifier. Recognition performance using HMM is better than LDA recogniser. For image compression, principal component analysis technique has been utilised.
ieee india conference | 2014
Astik Biswas; Prakash Kumar Sahu; Anirban Bhowmick; Mahesh Chandra
In the recent years Wavelet packet (WP) transform has been used as an important speech representation tool. WP based acoustic features have found to be more effective than short time Fourier transform (STFT) based features to capture the information of unvoiced phoneme in continuous speech. In this paper, a new 24 sub-band Equivalent Rectangular Bandwidth (ERB) like wavelet filter is proposed by employing perceptual wiener filter on each sub-band of decomposed noisy speech. Wiener filtered output is then proceed according to the Johnston model to calculate Auditory masking threshold for each wavelet decomposed sub-band. This threshold is used to design the perceptual sub-band weighting (PSW) filter. The output from each perceptually weighted sub-band is processed further to calculate acoustic front end features. This technique aims to enhance the noisy speech signal by using standard Wiener filter on psychoacoustically motivated decomposed wavelet sub-band by controlling the sub-band weighting factor. Hindi continuous digit database is used to evaluate the performance of the proposed feature. Obtained results show that proposed feature is effective for noisy speech recognition compared to some recently proposed feature extraction techniques.