Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Serajul Haque is active.

Publication


Featured researches published by Serajul Haque.


Speech Communication | 2009

Perceptual features for automatic speech recognition in noisy environments

Serajul Haque; Roberto Togneri; Anthony Zaknich

The performances of two perceptual properties of the peripheral auditory system, synaptic adaptation and two-tone suppression, are compared for automatic speech recognition (ASR) in an additive noise environment. A simple method of synaptic adaptation as determined by psychoacoustic observations was implemented with temporal processing of speech utilizing a zero-crossing auditory model as a pre-processing front end. The concept is similar to RASTA processing, but instead of bandpass filters, a high-pass infinite impulse response (IIR) filter is used. It is shown that rapid synaptic adaptation may be implemented by temporal processing using the zero-crossing algorithm, not otherwise implementable in the spectral domain implementation. The two-tone suppression was implemented in the zero-crossing auditory model using a companding strategy. Recognition performances with the two perceptual features were evaluated on isolated digits (TIDIGITS) corpus using continuous density HMM recognizer in white, factory, babble and Volvo noise. It is observed that synaptic adaptation performs better in stationary white Gaussian noise. In presence of non-stationary non-Gaussian noise, however, no improvements or a degradation is observed. Moreover, a reciprocal effect is observed with two-tone suppression, with better performance in non-Gaussian real-world noise and degradation in stationary white Gaussian noise.


international conference on acoustics, speech, and signal processing | 2007

A Temporal Auditory Model with Adaptation for Automatic Speech Recognition

Serajul Haque; Roberto Togneri; Anthony Zaknich

Rapid and short-term adaptation are dynamic mechanisms of human auditory system. An auditory model based on zero-crossings with peak amplitudes (ZCPA) was used as a front-end for automatic speech recognition (ASR) with the perceptual property of adaptation as determined by psychoacoustic observations. The model performance was evaluated on the isolated digits (TIDIGITS) database using continuous density HMM recognizer in additive noise environment. Experimental results indicate that the ASR performance of the ZCPA may be improved with adaptation over the static baseline performance in white Gaussian and factory noise. The perceptual front-end was also evaluated with dynamic (delta and delta-delta) features added to the adaptation. It was observed that adaptation with dynamic features performed better in factory, babble and car noise over a wide range of SNR values.


workshop on applications of computer vision | 2013

A lip extraction algorithm using region-based ACM with automatic contour initialization

Chao Sui; Mohammed Bennamoun; Roberto Togneri; Serajul Haque

In a lipreading system, lip extraction is a fundamental method that directly affects the final speech recognition results. However, most existing systems need to detect some facial features as prior-knowledge to construct the initial contour, and any erroneous feature detection will lead to an incorrect lip extraction. In order to solve this problem, this paper presents a new framework which integrates both global region-based Active Contour Model (ACM) and localized region-based ACM. With the utilization of the proposed framework, the initial contour does not need to be specified according to the speaker facial features before extracting the lip, so that any erroneous extraction introduced by an incorrect initial contour is effectively eliminated. Experimental results show the efficiency of the proposed method in comparison with the existing methods.


asilomar conference on signals, systems and computers | 2012

Discrimination comparison between audio and visual features

Chao Sui; Roberto Togneri; Serajul Haque; Mohammed Bennamoun

This paper aims at comparing the discrimination between audio, 2D-based visual and 3D-based visual features for the speech recognition purpose. The audio and visual feature extraction schemes and several feature selection techniques are described first in this paper. With the application of the described feature extraction and selection methods, several experiments are conducted to compare the discrimination of the audio features, the 2D visual features and the 3D visual features for the hVd words classification task. In our study, it is found that the 3D visual features have more separability than the 2D visual features, so that the 3D-based audio-visual speech recognition may achieve more desirable results than the traditional 2D-based counterpart.


EURASIP Journal on Advances in Signal Processing | 2013

Evaluations on underdetermined blind source separation in adverse environments using time-frequency masking

Ingrid Jafari; Serajul Haque; Roberto Togneri; Sven Nordholm

The successful implementation of speech processing systems in the real world depends on its ability to handle adverse acoustic conditions with undesirable factors such as room reverberation and background noise. In this study, an extension to the established multiple sensors degenerate unmixing estimation technique (MENUET) algorithm for blind source separation is proposed based on the fuzzy c-means clustering to yield improvements in separation ability for underdetermined situations using a nonlinear microphone array. However, rather than test the blind source separation ability solely on reverberant conditions, this paper extends this to include a variety of simulated and real-world noisy environments. Results reported encouraging separation ability and improved perceptual quality of the separated sources for such adverse conditions. Not only does this establish this proposed methodology as a credible improvement to the system, but also implies further applicability in areas such as noise suppression in adverse acoustic environments.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

An Auditory Motivated Asymmetric Compression Technique for Speech Recognition

Serajul Haque; Roberto Togneri; Anthony Zaknich

The Mel-frequency cepstral coefficient (MFCC) parameterization for automatic speech recognition (ASR) utilizes several perceptual features of the human auditory system, one of which is the static compression. Motivated by the human auditory system, the conventional static logarithmic compression applied in the MFCC is analyzed using psychophysical loudness perception curves. Following the property of the auditory system that the dynamic range compression is higher in the basal regions than the apical regions of the basilar membrane, we propose a method of unequal (asymmetric) compression, i.e., higher compression applied in the higher frequency regions than the lower frequency regions. The methods is applied and tested in the MFCC and the PLP parameterizations in the spectral domain, and the ZCPA auditory model used as an ASR front-end in the temporal domain. The extent of the asymmetric compression is applied as a multiplicative gain to the existing static compression, and is determined from the gradient of the piece-wise linear segment of the perceptual compression curve. The proposed method has the advantage of adjusting compression parametrically for improved ASR performance and audibility in noise conditions by low-frequency spectral enhancement, particularly of vowels with lower F1 and F2 formants. Continuous-density HMM recognition using the Aurora 2 corpus and the TIdigits show performance improvements in additive noise conditions.


international conference on audio, language and image processing | 2010

Utilizing auditory masking in automatic speech recognition

Serajul Haque

A speech recognition system based on the psychoa-coustics of the masking property of the of human auditory system is proposed. The method utilizes several psychoacoustic properties of human perception to define perceptual speech excitation function (masking threshold) and perceptual noise. Based on the auditory masking threshold, a time-frequency noise spectral subtraction is implemented. For a human listener, the noise below the masking threshold is inaudible, and the objective is to minimize only the noise spectrum above the masking threshold. Additionally, we show that, for ASR applications, further improvements in recognition performance may be obtained by augmenting the masking of the noise by spectral subtraction in the masked region also. The strategy is to remove the masked noise from the ASR system, similar to the masking effect in the human auditory system. Based on the AMT, and the estimated perceptual noise, we have implemented two spectral subtraction algorithms: a straight-forward scheme of subtracting the total estimated perceptual noise from the noisy speech spectrum, and a spectral subtraction of the noise which lies below the masking threshold. It was observed that, both methods give significant improvements over the base PLP performance, with the latter method giving better recognition results.


international conference on acoustics, speech, and signal processing | 2010

A psychoacoustic spectral subtraction method for noise suppression in automatic speech recognition

Serajul Haque; Roberto Togneri

A time-frequency spectral subtraction method based on the knowledge of several psychoacoustic properties of human perception is presented. These effects are the critical band filtering, synaptic adaptation which also introduces temporal forward masking, equal loudness preemphasis, power law of hearing, and simultaneous masking effect. The perceptual speech and noise is estimated separately by a detailed psychoacoustic non-linear transformation undergoing in the human auditory system. The spectral subtraction using a over-subtraction factor and a spectral floor is measured by a speech recognition front-end using a continuous density HMM recognizer. The method shows reduced residual noise and improved word recognition performance in broadband Gaussian noise conditions compared to conventional spectral subtraction method.


asilomar conference on signals, systems and computers | 2012

On the integration of time-frequency masking speech separation and recognition in underdetermined environments

Ingrid Jafari; Serajul Haque; Roberto Togneri; Sven Nordholm

The successful application of automatic speech recognition systems in the real world is conditional on its ability to handle realistic environments with unfavorable conditions such as reverberation and multiple sources of inteference. Previous research has identified time-frequency masking based approaches to blind source separation as a viable approach for multisource reverberant source separation. It is proposed the use of such separation techniques as a front-end to speech recognition will encourage greater recognition accuracy. Experimental evaluations confirmed the hypothesis with an improvement in recognition accuracy of over 20% at a reverberation time of RT60 = 300ms; this is indicative of the potential for future research in this field.


conference of the international speech communication association | 2011

Underdetermined Blind Source Separation with Fuzzy Clustering for Arbitrarily Arranged Sensors

Ingrid Jafari; Serajul Haque; Roberto Togneri; Sven Nordholm

Collaboration


Dive into the Serajul Haque's collaboration.

Top Co-Authors

Avatar

Roberto Togneri

University of Western Australia

View shared research outputs
Top Co-Authors

Avatar

Anthony Zaknich

University of Western Australia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chao Sui

University of Western Australia

View shared research outputs
Top Co-Authors

Avatar

Ingrid Jafari

University of Western Australia

View shared research outputs
Top Co-Authors

Avatar

Mohammed Bennamoun

University of Western Australia

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge