Is this you? Create Your Porfile

Manoj Banik

Ahsanullah University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manoj Banik is active.

Explore More

Publication

Featured researches published by Manoj Banik.

computer and information technology | 2010

Bangla phoneme recognition for ASR using multilayer neural network

Mohammed Rokibul Alam Kotwal; Manoj Banik; Qamrun Nahar Eity; Mohammad Nurul Huda; Ghulam Muhammad; Yousef Ajami Alotaibi

This paper presents a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ parameters calculated by linear regression (LR) are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings. From the experiments on Bangla speech corpus prepared by us, it is observed that the proposed method provides higher phoneme recognition performance than the existing method. Moreover, it requires a fewer mixture components in the HMMs.

asia pacific conference on circuits and systems | 2010

Bangla triphone HMM based word recognition

Mohammad Mahedi Hasan; Foyzul Hassan; Gazi Md. Moshfiqul Islam; Manoj Banik; Mohammed Rokibul Alam Kotwal; Sharif Mohammad Musfiqur Rahman; Ghulam Muhammad; Nurul Huda Mohammad

In this paper, we have prepared a medium size Bangla speech corpus and compare performances of different acoustic features for Bangla word recognition. Most of the Bangla automatic speech recognition (ASR) system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) are inputted to the triphone hidden Markov model (HMM) based classifiers for obtaining word recognition performance. From the experiments, it is shown that MFCC-based method of 39 dimensions provides a higher word correct rate (WCR) and word accuracy (WA) than the other methods investigated. Moreover, a higher WCR and WA is obtained by the MFCC39-based method with fewer mixture components in the HMM.

ieee international conference on signal and image processing | 2010

Japanese phonetic feature extraction for automatic speech recognition

Manoj Banik; Qamrun Nahar Eity; Nusrat Jahan Lisa; Foyzul Hassan; Aloke Kumar Saha; Mohammad Nurul Huda

This paper presents a method for extracting distinctive phonetic features (DPFs) for automatic speech recognition (ASR). The method comprises three stages: i) a acoustic feature extractor, ii) a multilayer neural network (MLN) and iii) a hidden Markov model (HMM) based classifier. At first stage, acoustic features, local features (LFs), are extracted from input speech. On the other stage, MLN generates a 45-dimentional DPF vector from the LFs of 75- dimentions. Finally, these 45-dimentional DPF vector is inserted into an HMM-based classifier to obtain phoneme strings. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed DPF extractor provides a higher phoneme correct rate and accuracy with fewer mixture components in the HMMs compared to the method based on mel frequency cepstral coefficients (MFCCs). Moreover, a higher correct rate for each phonetic feature is obtained using the proposed method.

computer and information technology | 2010

Multi-layer neural network classification of tongue movement ear pressure signal for human machine interface

Khondaker A. Mamun; Manoj Banik; Michael Mace; Mark E. Lutmen; Ravi Vaidyanathan; Shouyan Wang

Tongue movement ear pressure (TMEP) signals have been used to generate controlling commands in assistive human machine interfaces aimed at people with disabilities. The objective of this study is to classify the controlled movement related signals of an intended action from internally occurring physiological signals which can interfere with the inter-movement classification. TMEP signals were collected, corresponding to six types of controlled movements and activity relating to the potentially interfering environment including when a subject spoke, coughed or drank. The signal processing algorithm involved TMEP signal detection, segmentation, feature extraction and selection, and classification. The features of the segmented TMEP signals were extracted using the wavelet packet transform (WPT). A multi-layer neural network was then designed and tested based on statistical properties of the WPT coefficients. The average classification performance for discriminating interference and controlled movement related TMEP signal achieved 97.05%. The classification of TMEP signals based on the WPT is robust and the interferences to the controlling commands of TMEP signals in assistive human machine interface can be significantly reduced using the multi-layer neural network when considered in this challenging environment.

international conference on information technology: new generations | 2011

Development of Analysis Rules for Bangla Part of Speech for Universal Networking Language

Manoj Banik; Md. Rashiduzzaman Rasel; Aloke Kumar Saha; Foyzul Hassan; Mohammed Firoz Mridha; Mohammad Nurul Huda

The Universal Networking Language (UNL) is a worldwide generalizes form human interactive in machine independent digital platform for defining, recapitulating, amending, storing and dissipating knowledge or information among people of different affiliations. The theoretical and practical research associated with these interdisciplinary endeavor facilities in a number of practical applications in most domains of human activities such as creating globalization trends of market or geopolitical independence among nations. In our research work we have tried to develop analysis rules for Bangla part of speech which will help to create a doorway for converting the Bangla language to UNL and vice versa and overcome the barrier between Bangla to other Languages.

international conference on information technology: new generations | 2011

Phone Segmentation for Japanese Triphthong Using Neural Networks

Manoj Banik; Md. Modasser Hossain; Aloke Kumar Saha; Foyzul Hassan; Mohammed Rokibul Alam Kotwal; Mohammad Nurul Huda

Context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive tri-phone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.

international conference on computer applications and industrial electronics | 2010

Articulatory Δ and ΔΔ parameters effect on HMM-based classifier for ASR

Foyzul Hassan; Qamrun Nahar Eity; Mohammed Rokibul Alam Kotwal; Manoj Banik; Mohammad Mahedi Hasan; Sharif Mohammad Musfiqur Rahman; Ghulam Muhammad; Mohammad Nurul Huda

This paper describes an effect of articulatory Δ and ΔΔ parameters on automatic speech recognition (ASR). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features based ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises two stages: (i) DPFs extraction using a multilayer neural network (MLN) from acoustic features, local features (LFs) and (ii) incorporation of dynamic parameters (Δ and ΔΔ) into a hidden Markov model (HMM) based classifier for more accurate performances. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate and phoneme accuracy over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher performance.

international conference on computer applications and industrial electronics | 2010

Bangla phoneme recognition for different acoustic features

Mohammed Rokibul Alam Kotwal; Foyzul Hassan; Manoj Banik; Gazi Md. Moshfiqul Islam; Md. Rakibuzzaman; Mohammad Mahedi Hasan; Ghulam Muhammad; Mohammad Nurul Huda

In this paper, we compare among performance of different acoustic features for Bangla Automatic Speech Recognition (ASR). Most of the Bangla ASR system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) and local features (LFs) are inputted to the hidden Markov model (HMM) based classifiers for obtaining phoneme recognition performance. It is shown from the experimental results that MFCC-based method of 39 dimensions provides a higher phoneme correct rate and accuracy than the other methods investigated.

international conference hybrid intelligent systems | 2010

DPF-based japanese phoneme recognition using tandem MLNs

Mohammed Rokibul Alam Kotwal; Gazi Md. Moshfiqul Islam; Foyzul Hassan; Ghulam Muhammad; Manoj Banik; Md. Shahadat Hossain; Mohammad Mahedi Hasan; Mohammad Nurul Huda

This paper presents a method for automatic phoneme recognition for Japanese language using tandem MLNs. The method comprises three stages: (i) multilayer neural network (MLN) that converts acoustic features into distinctive phonetic features DPFs, (ii) MLN that combines DPFs and acoustic features as input and generates a 45 dimensional DPF vector with less context effect and (iii) the 45 dimensional feature vector generated by the second MLN are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings from the input speech. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate and improves phoneme accuracy tremendously over the method based on a single MLN. Moreover, it requires fewer mixture components in HMMs.

ieee international conference on signal and image processing | 2010

Bangla speech recognition using two stage multilayer neural networks

Qamrun Nahar Eity; Manoj Banik; Nusrat Jahan Lisa; Foyzul Hassan; Md. Shahadat Hossain; Mohammad Nurul Huda

This paper describes a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to improve the phoneme probabilities for the hidden Markov models (HMMs) by reducing the context effect. From the experiments on Bangla speech corpus prepared by us, it is observed that the proposed method provides higher phoneme recognition performance than the existing method. Moreover, it requires a fewer mixture components in the HMMs.

Explore More