Awais Mahmood
King Saud University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Awais Mahmood.
international conference on multimedia and expo | 2011
Ghulam Muhammad; Mansour Alsulaiman; Awais Mahmood; Zulfiqar Ali
In this paper, we propose an automatic voice disorder classification system using first two formants of vowels. Five types of voice disorder, namely, cyst, GERD, paralysis, polyp and sulcus, are used in the experiments. Spoken Arabic digits from the voice disordered people are recorded for input. First formant and second formant are extracted from the vowels [Fatha] and [Kasra], which are present in Arabic digits. These four features are then used to classify the voice disorder using two types of classification methods: vector quantization (VQ) and neural networks. In the experiments, neural network performs better than VQ. For female and male speakers, the classification rates are 67.86% and 52.5%, respectively, using neural networks. The best classification rate, which is 78.72%, is obtained for female sulcus disorder.
european modelling symposium | 2013
Mansour Alsulaiman; Zulfiqar Ali; Ghulam Muhammed; Mohamed A. Bencherif; Awais Mahmood
King Saud University speech database (KSU-DB) is a very rich speech database of Arabic language. Its richness is in many dimensions. It has more than three hundred speakers of both genders. The speakers are Arabs and non-Arabs belonging to twenty-nine different nationalities. The database has different types of text such as isolated words, digits, phonetically rich words and sentences, phonetically balanced sentences, paragraphs, and answers to questions. The KSU-DB was recorded in three different locations, the first is an office that represents a normal environment with low noise. The second and third locations are cafeteria and soundproof room representing noisy and quiet environments, respectively. The database has different channels of recordings, mobile, medium and high quality microphones connected to recording devices of different qualities. To track the inter-session variations of the speakers, the database was recorded in three sessions with a gap of about six weeks. Though the database main goal is for speaker recognition research, nonetheless, we made it very rich so that it can be used in many speech-processing researches. A team of native Arabs verified the database manually as well as automatically.
international conference on intelligent systems, modelling and simulation | 2012
Awais Mahmood; Mansour Alsulaiman; Ghulam Muhammad
This paper proposes a new feature extraction method called multi-directional local feature (MDLF) to apply on an automatic speaker recognition system. To obtain MDLF, a linear regression is applied on FFT signal in four different directions which are horizontal (time axis), vertical (frequency axis), diagonal 45 degree (time-frequency) and diagonal 135 degree (time-frequency). In the experiments, Gaussian mixture model with different number of mixtures is used as classifier. Different experiments were conducted using all alphabets of Arabic for speaker recognition systems. Experimental results show that the proposed MDLF achieves better recognition accuracies than the traditional MFCC and Local features for speaker recognition system.
asia modelling symposium | 2011
Mansour Alsulaiman; Ghulam Muhammad; Mohamed A. Bencherif; Awais Mahmood; Zulfiqar Ali; Mohammad Aljabri
Availability of databases is a necessity in the speech processing field. The publically available databases in Arabic language are few. In this paper we describe a rich database for Arabic language. The database is rich in many dimensions: in text, environments, microphone type, number of recording sessions, recording system, the transmission channel, the country of origin, and the mother language. This richness makes the database an important resource for research in Arabic Language processing and very useful in many speech processing tasks, such as speaker recognition, speech recognition, and accent identification. The speakers were speaking in Modern Standard Arabic (MSA).
grid and cooperative computing | 2013
Awais Mahmood; Mansour Alsulaiman; Ghulam Muhammad; Sid M. Selouani
A new feature extraction method is presented in this paper. It is modification to our previous work where we have extracted Multidimentional local features (MDLF). We name the new feature as Multi-Directional Local Feature with moving average(MDLF-Mavg). MDLF-Mavg is based on three-point linear regression and three point moving average. Linear regression is applied on horizontal (time axis) that captures phoneme onset and offset and vertical (frequency axis) which capture formant contours whereas modified moving average is applied on (45 degree) time-frequency axis and (135 degree) time-frequency axis which capture the voiceprint of speaker. The MDLF-Mavg performance is compared to other speech features in a speaker recognition system. MDLF-Mavg has shown better performance than the other features. In the case of the female only part of the database it achieved 100% recognition rate. We will show that MDLF-Mavg produce what can be looked at as a voice print for each speaker when vocalizing a certain text.
international conference on digital information management | 2010
Mansour Alsulaiman; Awais Mahmood; Muhammad Ghulam; Mohamed A. Bencherif; Yousef Ajami Alotaibi
Modeling a system by statistical methods needs large amount of data to train the system. In real life such data are sometimes not available or hard to collect. Modeling the system with small size database will produce a system with poor performance. In this paper we propose a method for increasing the size of the database. The method works by generating new samples from the original samples, using combinations of the following methods: speech lengthening, noise adding, and word reversal. To make a proof of concept, we used a severe test condition, in which the original database consists of one sample per speaker, for a speaker recognition system. We tested the system using original samples. The best results were 90% and 90.41% recognition rates for two subsets of the database for 25 and 50 speakers respectively.
Speech Communication | 2017
Mansour Alsulaiman; Awais Mahmood; Ghulam Muhammad
Abstract In this paper, we investigate the effect of Arabic phonemes on the performance of speaker recognition systems. The investigation reveals that some Arabic phonemes have a strong effect on the recognition rate of such systems. The performance of speaker recognition systems can be improved and their execution time can be reduced by utilizing this finding. Additionally, this finding can be used by segmenting the most effective phonemes for speaker recognition from the utterance, using only the segmented part of the speech for speaker recognition. It can also be used in designing the text to be used in high-performance speaker recognition systems. From our investigation, we find that the recognition rates of Arabic vowels were all above 80%, whereas the recognition rates for the consonants varied from very low (14%) to very high (94%), with the latter achieved by a pharyngeal consonant followed by the two nasal phonemes, which achieved recognition rates above 80%. Four more consonants had recognition rates between 70% and 80%. We show that by utilizing these findings and by designing the text carefully, we can build a high-performance speaker recognition system.
european modelling symposium | 2014
Ghulam Muhammad; Mansour Alsulaiman; Awais Mahmood; Malak Almojali; Bencherif Mohamed Abdelkader
This paper presents an automatic voice pathology detection using multiresolution technique, more specifically using Gabor wavelets. Gabor wavelets can extract information in various scales and orientations, and thereby can effectively encode distinguishable patterns of normal and pathological voice signals. First, the input voice is transformed to frequency domain using frame based Fourier transformation. 2D Gabor filters with different scale and orientation are applied on the Mel-filtered frequency representation. To reduce the dimension of Gabor features, principal component analysis is applied. These features are fed into a support vector machine for classification. In this investigation, we use two different well known databases, MEEI and SVD. The results show that the proposed method outperforms some of the state-of-the-art techniques used for voice pathology detection.
Cluster Computing | 2018
Awais Mahmood; Ghulam Muhammad; Mansour Alsulaiman; Habib Dhahri; Esam Othman; Mohammed Faisal
A new speech feature extraction technique called moving average multi directional local features (MA-MDLF) is presented in this paper. This method is based on linear regression (LR) and moving average (MA) in the time–frequency plane. Three-point LR is taken along time axis and frequency axis, and 3 points MA is taken along 45° and 135° in the time–frequency plane. The LR captures the voice onset\offset, formant contour, while the moving average captures the dynamics on time–frequency axes which can be seen as voiceprints. The MA-MDLF performance is compared to commonly used speech features in speaker recognition. The comparison is performed in a speaker recognition system (SRS) for three different conditions, namely clean speech, mobile speech, and cross channel. MA-MDLF has shown better performance than the baseline MFCC, RASTA-PLP and LPCC. In clean and mobile speech, MA-MDLF feature performs the best and also in the cross channel task MA-MDLF performed excellent. We also evaluated the MA-MDLF using three speech databases, namely KSU, LDC Babylon and TIMITdatabases, and found that MA-MDLF outperformed the other commonly used features with speech from all the three databases. The first and second databases are for Arabic speech while third is for English speech.
2017 International Conference on Informatics, Health & Technology (ICIHT) | 2017
Zulfiqar Ali; Mansour Alsulaiman; Ghulam Muhammad; Ahmed Al-nasheri; Awais Mahmood
Data mining has a great potential in different areas of health informatics. Data mining in health industry can minimize the health cost as well as reduces the risk of life by informing a person at initial stage. An automatic classification system capable of mining pathological data may contribute in health informatics significantly. In this paper, an automatic system to differentiate between pathological and normal data is developed. The developed system mines the pathological data on the basis of an acoustic analysis. The purpose of the acoustic analysis is to estimate the auditory spectrum of a voice sample by using the principle of the human auditory system called as critical bandwidths. The estimated auditory spectrum simulates the behavior of a human ear and acts like an expert clinician who can identify a pathological voice by hearing. The pathological data used for this study is recorded from the people who are suffering from more than 100 different types of voice disorders. Voice of a disordered patient feels noisy, harsh, strain, breathy and unpleasant to ears. During the training phase of the proposed system, it takes labeled normal and pathological data to generate acoustic models by using the Gaussian mixture model. While in deployment phase, an unknown and unlabeled voice sample is given to the system to determine its type, i.e. normal or pathological. The best obtained accuracy of the system is 99.50%.