Mohammed Albared
National University of Malaysia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohammed Albared.
rough sets and knowledge technology | 2010
Mohammed Albared; Nazlia Omar; Mohd Juzaiddin Ab Aziz; Mohd Zakree Ahmad Nazri
Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS tagger is a useful preprocessing tool in many natural languages processing (NLP) applications such as information extraction and information retrieval. In this paper, we present the preliminary achievement of Bigram Hidden Markov Model (HMM) to tackle the POS tagging problem of Arabic language. In addition, we have used different smoothing algorithms with HMM model to overcome the data sparseness problem. The Viterbi algorithm is used to assign the most probable tag to each word in the text. Furthermore, several lexical models have been defined and implemented to handle unknown word POS guessing based on word substring i.e. prefix probability, suffix probability or the linear interpolation of both of them. The average overall accuracy for this tagger is 95.8.
asian conference on intelligent information and database systems | 2011
Mohammed Albared; Nazlia Omar; Mohd Juzaiddin Ab Aziz
Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS is one of the important processing steps for many natural language systems such as information extraction, question answering. This paper presents a study aiming to find out the appropriate strategy to develop a fast and accurate Arabic statistical POS tagger when only a limited amount of training material is available. This is an essential factor when dealing with languages like Arabic for which small annotated resources are scarce and not easily available. Different configurations of a HMM tagger are studied. Namely, bigram and trigram models are tested, as well as different smoothing techniques. In addition, new lexical model has been defined to handle unknown word POS guessing based on the linear interpolation of both word suffix probability and word prefix probability. Several experiments are carried out to determine the performance of the different configurations of HMM with two small training corpora. The first corpus includes about 29300 words from both Modern Standard Arabic and Classical Arabic. The second corpus is the Quranic Arabic Corpus which is consisting of 77,430 words of the Quranic Arabic.
asia information retrieval symposium | 2014
Nazlia Omar; Mohammed Albared; Tareq Al-Moslmi; Adel Al-Shabi
Sentiment analysis is a very challenging and important task that involves natural language processing, web mining, and machine learning. Sentiment analysis in the Arabic language is a more challenging task than in other languages due to the morphological complexity of the Arabic and the large variation of its dialects. This paper presents an empirical comparison of seven feature selection methods (Information Gain, Principal Components Analysis, Relief-F, Gini Index, Uncertainty, Chi-squared, and Support Vector Machines (SVMs)), and three machine learning classifiers (SVM, Naive Bayes, and K-nearest neighbor) for Arabic sentiment classification. A wide range of comparative experiments are conducted on an opinion corpus for Arabic (OCA). This paper demonstrates that feature selection does improve the performance of Arabic sentiment-based classification, but the result depends on the method used and the number of features selected. The experimental results demonstrate that feature reduction methods are found to improve the classifier performance. Moreover, the experimental results indicate that SVM-based feature selection yields the best performance for feature selection and that the SVM classifier outperforms the other techniques for Arabic sentiment-based classification. Finally, the experiments indicate that the SVM classifier with the SVM-based feature selection method yields the best classification method, with an accuracy of 92.4%.
Journal of Information Science | 2018
Tareq Al-Moslmi; Mohammed Albared; Adel Al-Shabi; Nazlia Omar; Salwani Abdullah
Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.
Journal of Computer Science | 2014
Bashar Aubaidan; Masnizah Mohd; Mohammed Albared
This study presents the results of an experimental study of two document clustering techniques which are k-means and k-means++. In particular, we compare the two main approaches in crime document clustering. The drawback of k-means is that the user needs to define the centroid point. This becomes more critical when dealing with document clustering because each center point represented by a word and the calculation of distance between words is not a trivial task. To overcome this problem, a k-means++ was introduced in order to find a good initial center point. Since k-means++ has not being applied before in crime document clustering, this study presented a comparative study between k-means and k-means++ to investigate whether the initialization process in k-means++ does help to get a better results than k-means. We proposes the k-means++ clustering algorithm, to identify best seed for initial cluster centers in clustering crime document. The aim of this study is to conduct a comparative study of two main clustering algorithms, namely k-means and k-means++. The method of this study includes a pre-processing phase, which in turn involves tokeniza-tion, stop-words removal and stemming. In addition, we evaluate the impact of the two similarity/distance measures (Cosine similarity and Jaccard coefficient) on the results of the two clustering algorithms. Exper-imental results on several settings of the crime data set showed that by identifying the best seed for initial cluster centers, k-mean++ can significantly (with the significance interval at 95%) work better than k-means. These results demonstrate the accuracy of k-mean++ clustering algorithm in clustering crime doc-uments.
international conference on electrical engineering and informatics | 2009
Mohammed Albared; Nazlia Omar; Mohd Juzaiddin Ab Aziz
Parts of speech tagging forms the important pre-processing step in many of the natural language processing applications like text summarization, question answering and information retrieval system. MorphoSyntactic disambiguation (part of speech tagging) is the process of classifying every word in a given context to its appropriate part of speech. In this paper, we first review all the supervised machine learning approaches that have been used in the part of speech tagging. Then we review all the Arabic works to compare and to confirm our need to develop an accurate and efficient Arabic MorphoSyntactic Disambiguation system. Finally we propose a classifiers combination experimental framework for Arabic part of speech tagger in which three diverse probabilistic classifiers (Hidden Markov, Maximum Entropy and Transformation Based Learning) are combined using many different combination strategies to exploit their advantages
asian conference on intelligent information and database systems | 2011
Mohammed Albared; Nazlia Omar; Mohd Juzaiddin Ab Aziz
This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.
PLOS ONE | 2018
Ahmed Alsaffar; Suryanti Awang; Hai Tao; Nazlia Omar; Wafaa Al-Saiagh; Mohammed Albared
Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach.
Journal of Computer Science | 2010
Omar Shirko; Nazlia Omar; Haslina Arshad; Mohammed Albared
Journal of theoretical and applied information technology | 2013
Ali Mashaan Abed; Sabrina Tiun; Mohammed Albared