Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Noraini Seman is active.

Publication


Featured researches published by Noraini Seman.


international symposium on information technology | 2010

An evaluation of endpoint detection measures for malay speech recognition of an isolated words

Noraini Seman; Zainab Abu Bakar; Nordin Abu Bakar

This paper presents the endpoint detection approaches specifically for an isolated word uses Malay spoken speeches from Malaysian Parliamentary session. Currently, there are 34,466 vocabularies of utterances in the database collection and for the purpose of this study; the vocabulary is limited to 25 words which are most frequently spoken selected from ten speakers. Endpoint detection, which aims to distinguish the speech and non-speech segments of digital speech signal, is considered as one of the key preprocessing steps in speech recognition system. Proper estimation of the start and end of the speech (versus silence or background noise) avoids the waste of speech recognition evaluations on preceding or ensuing silence. In this study, the endpoint detection and speech segmentation task is achieved by using the three different algorithms, namely combination between Short-time Energy (STE) and Zero Crossing Rate (ZCR) measures, frame-based Teagers Energy (FTE), and Energy-Entropy feature (EEF). Three experiments were conducted separately to investigate the overall recognition rate obtained with a Discrete-Hidden Markov Model (DHMM) classifier approach on the testing data set that consists of 1250 utterances. The results show that EEF algorithm performs quite satisfactory and acceptable where average recognition rate is 80.76% if compared with other two algorithms. Each of the algorithms have the advantages and disadvantages and there are still misdetection of word boundaries for the words with weak fricative, plosive and nasal sounds and not robust enough to implement in Malaysian Parliamentary speech data. However, improvement is still possible to increase the performance of these algorithms.


2010 International Conference on Information Retrieval & Knowledge Management (CAMP) | 2010

Evaluating endpoint detection algorithms for isolated word from Malay parliamentary speech

Noraini Seman; Zainab Abu Bakar; Nordin Abu Bakar; Haslizatul Fairuz Mohamed; Nur Atiqah Sia Abdullah; Prasanna Ramakrisnan; Sharifah Mumtazah Syed Ahmad

This paper presents the endpoint detection approaches specifically for an isolated word uses Malay spoken speeches from Malaysian Parliamentary session. Currently, there are 7,995 vocabularies of utterances in the database collection and for the purpose of this study; the vocabulary is limited to ten words which are most frequently spoken selected from ten speakers. Endpoint detection, which aims to distinguish the speech and non-speech segments of digital speech signal, is considered as one of the key preprocessing steps in speech recognition system. Proper estimation of the start and end of the speech (versus silence or background noise) avoids the waste of speech recognition evaluations on preceding or ensuing silence. In this study, the endpoint detection and speech segmentation task is achieved by using the short-time energy (STE) and short-time zero crossing (STZC) measures and combination of both approaches. As a result, the Hidden Markov Model (HMM) recognizer derived the recognition accuracy rate of 91.4% for combination of both algorithms, if compared only 86.3% for STE and 82.1% for STZC rate alone. The experiments show that there are many problems arise where there are still misdetection of word boundaries for the words with weak fricative and nasal sounds. Other obstacles issues such as speaking styles or mood of speaking can also cause the recognition performance.


international conference on future generation communication and networking | 2012

Acoustical Analysis of Filled Pause in Malay Spontaneous Speech

Raseeda Hamzah; Nursuriati Jamil; Noraini Seman

Filled pause is one of disfluencies types, identified as the often occurred disfluency in spontaneous speech, known to affect Automatic Speech Recognition accuracy. The purpose of this study is to analyze acoustical features of filled pauses in Malay language spontaneous speech as the preliminary step of filled pause detection. The acoustic features that are extracted from the filled pause are formant frequencies and pitch. Two automated segmentation methods which are Zero Crossing Rates and Gaussian Probability Density Function are compared in our study to acquire an exact representation of the filled pause. The results reveal that the pitch and formant frequencies have lower standard deviation when segmented using Gaussian Probability Density Function compared to Zero Crossing Rates. The analysis of Malay filled pause presented in this paper is important as it proved that filled pauses in any language have standard acoustic features such as flat pitch and stable formant frequencies.


international conference on science and social research | 2010

Measuring the performance of isolated spoken Malay speech recognition using Multi-layer Neural Networks

Noraini Seman; Zainab Abu Bakar; Nordin Abu Bakar

This paper describes speech signal modeling techniques which are suited to high performance and robust isolated word recognition. In this study, a speech recognition system is presented, specifically an isolated spoken Malay word recognizer which uses spontaneous and formally speeches collected from Parliament of Malaysia. Currently the vocabulary is limited to 25 words that can be pronounced exactly as it written and controls the distribution of the vocalic segments. The speech segmentation task is achieved by adopted energy based parameter and zero crossing rate measure with modification to better locates the beginning and ending points of speech from the spoken words. The training and recognition processes are realized by using Multi-layer Perceptron (MLP) Neural Networks with two-layer network configurations that are trained with stochastic error back-propagation to adjust its weights and biases after presentation of every training data. The Mel-frequency Cepstral Coefficients (MFCCs) has been chosen as speech extraction approach from each segmented utterance as characteristic features for the word recognizer. Recognition results showed that the performance of the two-layer networks increased as the numbers of hidden neurons increased. The best network structures average classification rate is 84.731% with (150-25) configuration. Implementation results also showed that the conjugate gradient (CG) algorithm was more accurate and reliable than the Levenberg-Marquardt (LM) algorithm for the network complexities and data size considered in this study.


8th International Conference on Robotic, Vision, Signal Processing and Power Applications, RoViSP 2013 | 2014

Filled Pause Classification Using Energy-Boosted Mel-Frequency Cepstrum Coefficients

Raseeda Hamzah; Nursuriati Jamil; Noraini Seman

Filled pause is one type of disfluency, identified as the often occurred disfluency in spontaneous speech and known to affect Automatic Speech Recognition accuracy. The purpose of this study is to analyze the impact of boosting Mel-Frequency Cepstral Coefficients with energy feature in classifying filled pause. A total of 828 filled pauses comprising a mixture of 62 male and female speakers are classified into /mhm/, /aaa/ and /eer/. A back-propagation neural network using fusion of gradient descent with momentum and adaptive learning rate is used as the classifier. The results revealed that energy-boosted Mel-Frequency Cepstral Coefficients produced a higher accuracy rate of 77 % in classifying filled pauses.


asia information retrieval symposium | 2013

Improving Speech Recognizer Using Neuro-genetic Weights Connection Strategy for Spoken Query Information Retrieval

Noraini Seman; Zainab Abu Bakar; Nursuriati Jamil

This paper describes the integration of speech recognizer into information retrieval (IR) system to retrieve text documents relevant to the given spoken queries. Our aim is to improve the speech recognizer since it has been proven as crucial for the front end of a Spoken Query IR system. When speech is used as the source material for indexing and retrieval, the effect of transcriber error on retrieval performance effectiveness must be considered. Thus, we proposed a dynamic weights connection strategy of artificial intelligence (AI) learning algorithms that combined genetic algorithms (GA) and neural network (NN) methods to improve the speech recognizer. Both algorithms are separate modules and were used to find the optimum weights for the hidden and output layers of a feed-forward artificial neural network (ANN) model. A mutated GA technique was proposed and compared with the standard GA technique. One hundred experiments using 50 selected words from spontaneous speeches were conducted. For evaluating speech recognition performance, we used the standard word error rate (WER) and for evaluating retrieval performance, we utilized precision and recall with respect to manual transcriptions. The proposed method yielded 95.39% recognition performance of spoken query input reducing the error rate to 4.61%. As for retrieval performance, our mutated GA+ANN model achieved a commendable 91% precision rate and 83% recall rate. It is interesting to note that the degradation in precision-recall is the same as the degradation in recognition performance of speech recognition engine. Owing to this fact, GA combined with ANN proved to attain certain advantages with sufficient accuracy.


international conference on computer and information application | 2010

The optimization of Artificial Neural Networks connection weights using genetic algorithms for isolated spoken Malay parliamentary speeches

Noraini Seman; Zainab Abu Bakar; Nordin Abu Bakar

This paper presents the structure of a neural network models for validation recognition performances of isolated spoken Malay utterances. Artificial Neural Network (ANN) has been well recognized for its approximation capability provided the input-output data are available. Nevertheless, the conventional training algorithm, Levenberg-Marquardt (LM) algorithms that utilized as gradient search method in the model development has always encountered difficulties to converge at global solution. Aiming at improving the accuracy and robustness of ANN model, Genetic Algorithm (GA) was introduced in ANN modelling for connection weights evolution. From the results, it was observed that the performance of GA-ANN models is better than ANN-LM models. Integrating the GA with feedforward network can improve mean square error (MSE) performance and by this two stage training scheme, the recognition rate can be increased up to 85%.


international conference hybrid intelligent systems | 2016

Bimodality Streams Integration for Audio-Visual Speech Recognition Systems

Noraini Seman; Rosniza Roslan; Nursuriati Jamil; Norizah Ardi

This paper demonstrates the state-of-the-art of ‘whole-word-state Dynamic Bayesian Network (DBN)’ model of audio and visual integration. In fact, many DBN models have been proposed in recent years for speech recognition due to its strong description ability and flexible structure. DBN is a statistic model that can represent multiple collections of random variables as they evolve over time. However, DBN model with whole-word-state structure, does not allow making speech as subunit segmentation. In this study, single stream DBN (SDBN) model is proposed where speech recognition and segmentation experiments are done on audio and visual speech respectively. In order to evaluate the performances of the proposed model, the timing boundaries of the segmented syllable word is compared to those obtained from the well trained tri-phone Hidden Markov Models (HMM). Besides the word recognition results, word syllable recognition rate and segmentation outputs are also obtained from the audio and visual speech features streams. Experiment results shows that, the integration of SDBN model with perceptual linear prediction (PLP) feature stream produce higher word recognition performance rate of 98.50 % compared with the tri-phone HMM model in clean environment. Meanwhile, with the increasing noise in the audio stream, the SDBN model shows more robust promising results.


international conference on speech and computer | 2016

Prosody Analysis of Malay Language Storytelling Corpus

Izzad Ramli; Noraini Seman; Norizah Ardi; Nursuriati Jamil

In this paper, the prosody of the storytelling speech corpus is analyzed. The main objective of the analysis is to develop prosody rules to convert neutral speech to storytelling speech. The speech corpus (neutral and storytelling speech) contains 464 speech sentences, 4,656 words, and 10,928 syllables. It was recorded by three female storytellers, one male professional speaker, two female speakers and two male speakers. The prosodic features considered for analysis are tempo, pause (sentence and phrase-level), duration, intensity, and pitch. Further analysis of the word categories exist in storytelling speech such as verb, adverb, adjective, noun, conjunction and amplifier are also conducted. The global prosody analysis showed that mean prosodic of storytelling is higher than neutral speech, especially intensity and pitch. Investigation on the word categories showed that words categorized as adverb, adjective, amplifier and conjunctions have significant number of prominent syllables. Meanwhile, nouns and verbs do not have significant difference between neutral and storytelling speech. Positions of the words (i.e. initial, middle, last) in a phrase for different word categories also proved to have different increasing factor in duration, pitch and intensity.


2016 IEEE Industrial Electronics and Applications Conference (IEACon) | 2016

An improved pitch contour formulation for Malay language storytelling Text-to-Speech (TTS)

Izzad Ramli; Nursuriati Jamil; Noraini Seman; Norizah Ardi

In this paper, an improved pitch contour formulation is introduced by modifying the existing pitch contour sinusoidal function. The aim is to convert neutral speech into storytelling speech in Malay Language. Our speech datasets (neutral and storytelling speech) were recorded by a male and a female professional speaker. They contain 116 speech sentences, 1,164 words, and 2,732 syllables. For storytelling speech, 124 prominent syllables are detected using Prosogram tool. These prominent syllables are further categorized into six clusters of pitch contour. Distance measurements using one minus Pearson correlation is done to assess the similarity of the proposed pitch contour formulae to the original storytelling pitch contour. The proposed pitch contour sinusoidal function is also compared with the existing pitch contour function used by previous work. The results showed that the proposed pitch contour formulation performed better than the existing pitch contour formulae.

Collaboration


Dive into the Noraini Seman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Norizah Ardi

Universiti Teknologi MARA

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Raseeda Hamzah

Universiti Teknologi MARA

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Izzad Ramli

Universiti Teknologi MARA

View shared research outputs
Top Co-Authors

Avatar

Rosniza Roslan

Universiti Teknologi MARA

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge