Mohd Shamrie Sainin
Universiti Utara Malaysia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohd Shamrie Sainin.
data mining and optimization | 2011
Mohd Shamrie Sainin; Rayner Alfred
Feature selection for data mining optimization receives quite a high demand especially on high-dimensional feature vectors of a data. Feature selection is a method used to select the best feature (or combination of features) for the data in order to achieve similar or better classification rate. Currently, there are three types of feature selection methods: filter, wrapper and embedded. This paper describes a genetic based wrapper approach that optimizes feature selection process embedded in a classification technique called a supervised Nearest Neighbour Distance Matrix (NNDM). This method is implemented and tested on several datasets obtained from the UCI Machine Learning Repository and other datasets. The results demonstrate a significant impact on the predictive accuracy for feature selection combined with the supervised NNDM in classifying new instances. Therefore it can be used in other applications that require feature dimension reduction such as image and bioinformatics classifications.
data mining and optimization | 2012
Mohd Shamrie Sainin; Rayner Alfred
Researchers have shown that although traditional direct classifier algorithm can be easily applied to multiclass classification, the performance of a single classifier is decreased with the existence of imbalance data in multiclass classification tasks. Thus, ensemble of classifiers has emerged as one of the hot topics in multiclass classification tasks for imbalance problem for data mining and machine learning domain. Ensemble learning is an effective technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuraciesand may outperform any single sophisticated classifiers. In this paper, an ensemble learner called a Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) that combines simple nearest neighbour and Naive Bayes algorithms is proposed. A combiner method called OR-tree is used to combine the decisions obtained from the ensemble classifiers. The DECIML framework has been tested with several benchmark dataset and shows promising results.
advanced data mining and applications | 2010
Mohd Shamrie Sainin; Rayner Alfred
A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the various measures available (e.g. Euclidean distance, Manhattan distance, Mahalanobis distance or other specific distance measures). In this paper, we propose a modified nearest neighbour method called Nearest Neighbour Distance Matrix (NNDM) for classification based on unsupervised and supervised distance matrix. In the proposed NNDM method, an Euclidean distance method coupled with a distance loss function is used to create a distance matrix. In our approach, distances of each instance to the rest of the training instances data will be used to create the training distance matrix (TADM). Then, the TADM will be used to classify a new instance. In supervised NNDM, two instances that belong to different classes will be pushed apart from each other. This is to ensure that the instances that are located next to each other belong to the same class. Based on the experimental results, we found that the trained distance matrix yields reasonable performance in classification.
INNOVATION AND ANALYTICS CONFERENCE AND EXHIBITION (IACE 2015): Proceedings of the 2nd Innovation and Analytics Conference & Exhibition | 2015
Md. Rajib Hasan; Fadzilah Siraj; Mohd Shamrie Sainin
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
international conference on computational science | 2014
Mohd Shamrie Sainin; Rayner Alfred
Malaysian medicinal plants may be abundant natural resources but there has not been much research done on preserving the knowledge of these medicinal plants which enables general public to know the leaf using computing capability. Therefore, in this preliminary study, a novel framework in order to identify and classify tropical medicinal plants in Malaysia based on the extracted patterns from the leaf is presented. The extracted patterns from medicinal plant leaf are obtained based on several angle features. However, the extracted features create quite large number of attributes (features), thus degrade the performance most of the classifiers. Thus, a feature selection is applied to leaf data and to investigate whether the performance of a classifier can be improved. Wrapper based genetic algorithm (GA) feature selection is used to select the features and the ensemble classifier called Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) is used as a classifier. The performance of the feature selection is compared with two feature selections from Weka. In the experiment, five species of Malaysian medicinal plants are identified and classified in which will be represented by using 65 images. This study is important in order to assist local community to utilize the knowledge and application of Malaysian medicinal plants for future generation.
international conference on computational science | 2017
Mohd Shamrie Sainin; Rayner Alfred; Fairuz Adnan; Faudziah Ahmad
The aim of this paper is to investigate the effects of combining various sampling and ensemble classifiers on the prediction performance in addressing the multiclass imbalance data learning. This research uses data obtained from the Malaysian medicinal leaf images shape data and three other large benchmark datasets in which seven ensemble methods from Weka machine learning tool were selected to perform the classification task. These ensemble methods include the AdaboostM1, Bagging, Decorate, END, MultiboostAB, RotationForest, and stacking methods. In addition to that, five base classifiers were used; Naive Bayes, SMO, J48, Random Forest, and Random Tree in order to examine the performance of the ensemble methods. Two methods of combining the sampling and ensemble classifiers were used which are called the Resample with ensemble classifier and SMOTE with ensemble classifier. The results obtained from the experiments show that there is actually no single configuration that is “one design that fits all”. However, it is proven that when using the sampling and ensemble classifier which is coupled with Random Forest, the prediction performance of the classification task can be improved on the multiclass imbalance dataset.
international conference on computational science | 2017
Rayner Alfred; Chong Jia Chung; Chin Kim On; Ag Asri Ag Ibrahim; Mohd Shamrie Sainin; Paulraj Murugesa Pandiyan
Amount of data generated and stored in relational databases has motivated numerous researchers to study and develop learning algorithms on learning relational data mining. One of the most important relational tasks is to discover knowledge from relational data for a better decision making. Despite that, various representations can be generated using the same data by applying the Self-Organizing Map (SOM) methods in clustering relational data. This can be achieved by tuning the parameters used in Self-Organizing Map (SOM), such as the number of clustering, weights, seeds, epoch and others. Thus, this paper proposes a summarization method that applies SOM as the main algorithm to cluster relational data and applies the concept of data fusion in order to get better results in learning relational data. Input data obtained from Dynamic Aggregation of Relational Attributes will be clustered using the SOM method by tuning the SOM parameters. Results generated will be fused and embedded into the target table to form a single representation. A few representations will be formed and fed into the classifiers (J48 Decision Tree and Naive Bayes classification model) as input data. Throughout the experiments conducted, representations that are extracted by tuning the number of cluster produced better results compared to the representations that are extracted by tuning the other parameters. Overall, the data summarization approach based on individual data fusion is found to perform better compared to the other types of data fusion. In addition to that, the clusters based data fusion with average number of clusters provided better accuracy performances compared to clusters based data fusion with small and large number of clusters.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND TECHNOLOGY 2016 (ICAST’16) | 2016
Mohd Shamrie Sainin; Faudziah Ahmad; Rayner Alfred
Shape is the main information for leaf feature that most of the current literatures in leaf identification utilize the whole leaf for feature extraction and to be used in the leaf identification process. In this paper, study of half-leaf features extraction for leaf identification is carried out and the results are compared with the results obtained from the leaf identification based on a full-leaf features extraction. Identification and classification is based on shape features that are represented as cosines and sinus angles. Six single classifiers obtained from WEKA and seven ensemble methods are used to compare their performance accuracies over this data. The classifiers were trained using 65 leaves in order to classify 5 different species of preliminary collection of Malaysian medicinal plants. The result shows that half-leaf features extraction can be used for leaf identification without decreasing the predictive accuracy.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND TECHNOLOGY 2016 (ICAST’16) | 2016
Mohd Shamrie Sainin; Rayner Alfred; Faudziah Ahmad
A hybrid ensemble classifier which combines the entropy based naive Bayes (ENB) classifier strategy and k-nearest neighbor (k-NN) is examined. The classifiers are joined in light of the fact that naive Bayes gives prior estimations taking into account entropy while k-NN gives neighborhood estimate to model for a deferred characterization. While original NB utilizes the probabilities, this study utilizes the entropy as priors for class estimations. The result of the hybrid ensemble classifier demonstrates that by consolidating the classifiers, the proposed technique accomplishes promising execution on several benchmark datasets.
International Conference on Advances in Information and Communication Technology | 2016
Rayner Alfred; Kung Ke Shin; Mohd Shamrie Sainin; Chin Kim On; Paulraj Murugesa Pandiyan; Ag Asri Ag Ibrahim
Due to the growing amount of data generated and stored in relational databases, relational learning has attracted the interest of researchers in recent years. Many approaches have been developed in order to learn relational data. One of the approaches used to learn relational data is Dynamic Aggregation of Relational Attributes (DARA). The DARA algorithm is designed to summarize relational data with one-to-many relations. However, DARA suffers a major drawback when the cardinalities of attributes are very high because the size of the vector space representation depends on the number of unique values that exist for all attributes in the dataset. A feature selection process can be introduced to overcome this problem. These selected features can be further optimized to achieve a good classification result. Several clustering runs can be performed for different values of k to yield an ensemble of clustering results. This paper proposes a two-layered genetic algorithm-based feature selection in order to improve the classification performance of learning relational database using a k-NN ensemble classifier. The proposed method involves the task of omitting less relevant features but retaining the diversity of the classifiers so as to improve the performance of the k-NN ensemble. The result shows that the proposed k-NN ensemble is able to improve the performance of traditional k-NN classifiers.