Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hesham Tolba is active.

Publication


Featured researches published by Hesham Tolba.


international conference on acoustics, speech, and signal processing | 2002

Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm

Hesham Tolba; Sid-Ahmed Selouani; Douglas D. O'Shaughnessy

In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems. Our goal in this paper is to improve the performance of the HMM-based ASR systems by exploiting some features that characterize speech sounds based on the auditory system and one based on the Fourier power spectrum. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main peaks of the spectrum of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance. The Hidden Markov Model Toolkit (HTK) was used throughout our experiments to test the use of the new multi-stream feature vector. A series of experiments on speaker-independent continuous-speech recognition have been carried out using a subset of the large read-speech corpus TIMIT. Using such multi-stream paradigm, N-mixture mono-/tri-phone models and a bigram language model, we found that the word error rate was decreased by about 4.01%.


international conference on acoustics speech and signal processing | 1999

Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision

Douglas D. O'Shaughnessy; Hesham Tolba

We show that the concept of voiced-unvoiced (VU) classification of speech sounds can be incorporated not only in speech analysis or speech enhancement processes, but also can be useful for recognition processes. That is, the incorporation of such a classification in a continuous speech recognition (CSR) system not only improves its performance in low SNR environments, but also limits the time and the necessary memory to carry out the process of the recognition. The proposed V-U classification of the speech sounds has two principal functions: (1) it allows the enhancement of the voiced and unvoiced parts of speech separately; (2) it limits the Viterbi (1967) search space, and consequently the process of recognition can be carried out in real time without degrading the performance of the system. We prove via experiments that such a system outperforms the baseline HTK when a V-U decision is included in both front- and far-end of the HTK-based recognizer.


international conference on acoustics, speech, and signal processing | 2004

Automatic recognition of Bluetooth speech in 802.11 interference and the effectiveness of insertion-based compensation techniques

Amr H. Nour-Eldin; Hesham Tolba; Douglas D. O'Shaughnessy

We investigate the ASR performance of speech transmitted over a noisy Bluetooth RF channel. Bluetooth shares its transmission channel with IEEE 802.11-based devices. Despite Bluetooths frequency hopping scheme, our investigation shows that Bluetooth packet loss rates may reach up to 38% in unfavorable 802.11 interference conditions, and as Bluetooth uses a CVSD (continuous variable slope delta modulation) codec with syllabic companding, these packet losses not only manifest themselves as segments of missing speech upon CVSD decoding, but also as incorrect scaling of subsequent successfully received voice packets as CVSD step-size information is also lost. We investigate the effects of these degradations on the ASR performance of Bluetooth speech, and accordingly propose alternative CVSD decoder schemes employing insertion-based techniques for compensating for these effects. Results show that our proposed techniques improve ASR performance considerably while requiring only minor modifications to the current Bluetooth receiver.


north american chapter of the association for computational linguistics | 2003

Auditory-based acoustic distinctive features and spectral cues for robust automatic speech recognition in Low-SNR car environments

Sid-Ahmed Selouani; Hesham Tolba; Douglas D. O'Shaughnessy

In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance in noisy car environments.


international conference on acoustics speech and signal processing | 1998

Automatic speech recognition based on cepstral coefficients and a mel-based discrete energy operator

Hesham Tolba; Douglas D. O'Shaughnessy

In this paper, a novel feature vector based on both mel frequency cepstral coefficients (MFCCs) and a mel-based nonlinear discrete-time energy operator (MDEO) is proposed to be used as the input of an HMM-based automatic continuous speech recognition (ACSR) system. Our goal is to improve the performance of such a recognizer using the new feature vector. Experiments show that the use of the new feature vector increases the recognition rate of the ACSR system. The HTK hidden Markov model toolkit was used throughout. Experiments were done on both the TIMIT and NTIMIT databases. For the TIMIT database, when the MDEO was included in the feature vector to test a multi-speaker ACSR system, we found that the error rate decreased by about 9.51%. On the other hand, for NTIMIT, the MDEO deteriorates the performance of the recognizer. That is, the new feature vector is useful for clean speech but not for telephone speech.


canadian conference on electrical and computer engineering | 2008

Incorporating phonetic knowledge into a multi-stream HMM framework

Atta Norouzian; Sid-Ahmed Selouani; Hesham Tolba; Douglas D. O'Shaughnessy

This paper presents a technique for improving the performance of multi-stream HMMs in ASR systems. In this technique stream exponents of the multi-stream model are chosen with respect to the phonological content of the underlying states. Two distinctive feature sets namely MFCCs and formant-like features are used for investigating the potential of this technique. The experiments are performed on the AURORA database under the distributed speech recognition (DSR) framework. The proposed front-end constitutes an alternative to the DSR-XAFE (XAFE : eXtended Audio Front-End) provided by European Telecommunications Standards Institute. It is shown that the results obtained from the proposed method leads to improvement up to 10% in word accuracy relative to the word accuracy obtained form the multi-stream model with tied exponents and up to 35% relative improvement in word accuracy over the state-of-the-art MFCC-based system.


canadian conference on electrical and computer engineering | 2005

Robust recognition of noisy speech over H.323 networks

Gang Chen; Hesham Tolba; Douglas D. O'Shaughnessy

In this paper, we investigate the performance of a speech recognizer on noisy speech transmitted over an H.323 channel, where the minimum mean-square error log spectra amplitude (MMSE-LSA) method is used to reduce the mismatch between training and deployment condition in order to achieve robust speech recognition. In the IP communication environment, one of the sources of distortion to the speech is packet loss. Of course, when ASR systems are used in adverse conditions, their performance degrades. In our work, we not only evaluate the impact of packet losses on speech recognition performance, but also explore the effects of uncorrelated additive noise on the performance. For measuring the influence of missing speech packets on the ASR system performance, we use a Soekris net 4501 IP simulator made by the Engineering Soekris Engineering Company, in order to control packet loss rate. To explore how additive acoustic noise affects the speech recognition performance, six types of noise sources are selected for use in our experiments. The experimental results indicate that the MMSE-LSA enhancement method apparently increased robustness for some type of additive noise under certain packet loss rates over the IP


international symposium on communications, control and signal processing | 2008

Incorporating formant cues into distributed speech recognition systems

Atta Norouzian; Sid-Ahmed Selouani; Hesham Tolba; Douglas D. O'Shaughnessy

The current front-end for distributed speech recognition (DSR) systems provided by European Telecommunications Standards Institute (ETSI) is mainly based on the state-of-the- art MFCC features. The method proposed in this paper aims to improve the performance of the present ETSI DSR-XAFE (XAFE: extended Audio Front-End). For this purpose two sets of acoustical features namely formant-like features and MFCC features are integrated under the multi-stream framework to form a feature vector which is more robust against additive noise. It is shown that for noisy speech, combining cepstral coefficients with main spectral peaks also known as formant-like features, using the multi-stream framework, leads to significant improvement in word recognition accuracy relative to word accuracy obtained for MFCCs alone.


international conference on acoustics, speech, and signal processing | 2000

Towards a large-vocabulary French vocal dictation based on a size-independent language-model search using the INRS recognizer

Hesham Tolba; Douglas D. O'Shaughnessy

Reports the progress of the large-vocabulary French-speech vocal dictation studies at INRS-Te/spl acute/le/spl acute/com. To evaluate such progress, the hidden Markov model (HMM) based recognizer of INRS is used. This recognizer, which represents each phone using HMMs, uses context-dependent phone modeling and n-gram statistics in order to cope with both coarticulation and phonological phenomena, respectively. A series of experiments on speaker-independent continuous-speech recognition have been carried out using a subset of the large read-speech French-language corpus, BREF, containing recordings of texts selected from the French newspaper Le Monde. We show through experiments that using a lexical graph that ignores the language model states and homophone distinctions and postponing the application of such knowledge to a post-processor simplifies the recognition process while keeping its high accuracy. The word recognition rate, using gender-dependent vector quantization (VQ) models, a 20,000-word pronunciation variants-based lexicon and a bigram model estimated using Le Monde text data, was found to be 91.62% for males and 90.98% for females.


Archive | 2001

Speech Recognition by Intelligent Machines

Hesham Tolba; Douglas D. O'Shaughnessy

Collaboration


Dive into the Hesham Tolba's collaboration.

Top Co-Authors

Avatar

Douglas D. O'Shaughnessy

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gang Chen

Université du Québec

View shared research outputs
Top Co-Authors

Avatar

Zili Li

Université du Québec

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Habib Hamam

Université de Moncton

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge