Nasullah Khalid Alham

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nasullah Khalid Alham is active.

Explore More

Publication

Featured researches published by Nasullah Khalid Alham.

fuzzy systems and knowledge discovery | 2010

MRSim: A discrete event based MapReduce simulator

Suhel Hammoud; Maozhen Li; Yang Liu; Nasullah Khalid Alham; Zelong Liu

Recently MapReduce programming model is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use. The Hadoop implementation of MapReduce is one of the most popular tools for many programmers due its ability to hide details of parallel programming from the users. However work on simulating the Hadoop environment is still in its infancy. Although there are a large number of simulating tools available to simulate distributed environments. However there are only a few simulators available which specifically targets the MapReduce environment. Based on testing we performed; the usability of these simulators is not satisfactory due to the simplified design which limits simulating jobs with variance configurations. We have designed and implemented a MapReduce simulator based on discrete event simulation called MRSim which accurately simulate the Hadoop environment. The simulator on one hand allows us to measure scalability of MapReduce based applications easily and quickly, on the other hand captures the effects of different configurations of Hadoop setup on MapReduce based applications behavior in terms of Hadoop job completion times and hardware utilization.

Future Generation Computer Systems | 2013

HSim: A MapReduce simulator in enabling Cloud Computing

Yang Liu; Maozhen Li; Nasullah Khalid Alham; Suhel Hammoud

Abstract MapReduce is an enabling technology in support of Cloud Computing. Hadoop which is a MapReduce implementation has been widely used in developing MapReduce applications. This paper presents HSim, a MapReduce simulator which builds on top of Hadoop. HSim models a large number of parameters that can affect the behaviors of MapReduce nodes, and thus it can be used to tune the performance of a MapReduce cluster. HSim is validated with both benchmark results and user customized MapReduce applications.

Computers & Mathematics With Applications | 2011

A MapReduce-based distributed SVM algorithm for automatic image annotation

Nasullah Khalid Alham; Maozhen Li; Yang Liu; Suhel Hammoud

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) have been used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. This paper presents MRSMO, a MapReduce based distributed SVM algorithm for automatic image annotation. The performance of the MRSMO algorithm is evaluated in an experimental environment. By partitioning the training dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computers, the MRSMO algorithm reduces the training time significantly while maintaining a high level of accuracy in both binary and multiclass classifications.

fuzzy systems and knowledge discovery | 2010

A MapReduce based distributed LSI

Yang Liu; Maozhen Li; Suhel Hammoud; Nasullah Khalid Alham; Mahesh Ponraj

Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSIs performance in terms of speed

fuzzy systems and knowledge discovery | 2009

Evaluating Machine Learning Techniques for Automatic Image Annotations

Nasullah Khalid Alham; Maozhen Li; Suhel Hammoud; Hao Qi

The past decade has seen a rapid development in Content Based Image Retrieval (CBIR). CBIR is the retrieval of images based on their low level features such as color, texture, shape etc. To improve the retrieval accuracy, the research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the ‘semantic gap’ between the visual features and the richness of human semantics. Image annotation techniques have been proposed to facilitate CBIR. This paper evaluates 7 representative machine learning techniques for automatic image annotations using 5000 images. An image annotation prototype is implemented and the evaluation results are presented and analyzed.

Computers & Mathematics With Applications | 2013

A MapReduce-based distributed SVM ensemble for scalable image classification and annotation

Nasullah Khalid Alham; Maozhen Li; Yang Liu; Man Qi

A combination of classifiers leads to a substantial reduction of classification errors in a wide range of applications. Among them, support vector machine (SVM) ensembles with bagging have shown better performance in classification than a single SVM. However, the training process of SVM ensembles is notably computationally intensive, especially when the number of replicated training datasets is large. This paper presents MRESVM, a MapReduce-based distributed SVM ensemble algorithm for scalable image annotation which re-samples the training dataset based on bootstrapping and trains an SVM on each dataset in parallel using a cluster of computers. A balanced sampling strategy for bootstrapping is introduced to increase the classification accuracy. MRESVM is evaluated in both experimental and simulation environments, and the results show that the MRESVM algorithm reduces the training time significantly while achieving a high level of accuracy in classifications.

fuzzy systems and knowledge discovery | 2010

A distributed SVM for image annotation

Nasullah Khalid Alham; Maozhen Li; Suhel Hammoud; Yang Liu; Mahesh Ponraj

The popularity of SVMs has grown tremendously in the last few years for many different classification problems due to its generalization properties, however training SVMs require high computational power. Platts SMO is one the fastest algorithm for training support vector machines, which takes the decomposition technique to the extreme by selecting a set of only two points as the working set then solving them analytically. However SMO becomes slow for large size training data set. In this paper we present a MapReduce based distributed implementation of SMO using Hadoop. The distributed SMO uses multiple core processors to process the training data. By partitioning the training data set into smaller subsets and allocating each of the partitioned subsets to a single Map task, each Map task optimizes the partition in parallel and finally the reducer combine the results. Experiments show the efficiency of the distributed SMO increases with the increase of the number of processors, the training speed of distributed SMO with 12 Map task is about 11times higher than standalone SMO. There is no significant difference in accuracy between distributed and standalone SMO.

fuzzy systems and knowledge discovery | 2011

Parallelizing multiclass Support Vector Machines for scalable image annotation

Nasullah Khalid Alham; Maozhen Li; Yang Liu; Suhel Hammoud

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them, Support Vector Machines (SVMs) are used extensively due to their generalization properties. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. Notably, SVM training is a computationally intensive process especially when the training dataset is large. This paper presents a resource aware parallel multiclass SVM algorithm (named RAMSMO) for large-scale image annotation which partitions the training dataset into smaller binary chunks and optimizes SVM training in parallel using a cluster of computers. A genetic algorithm-based load balancing scheme is designed to optimize the performance of RAMSMO in balancing the computation of multiclass data chunks in heterogeneous computing environments. RAMSMO is evaluated in both experimental and simulation environments, and the results show that it reduces the training time significantly while maintaining a high level of accuracy in classifications.

fuzzy systems and knowledge discovery | 2011

Load balancing in MapReduce environments for data intensive applications

Yang Liu; Maozhen Li; Nasullah Khalid Alham; Suhel Hammoud; Mahesh Ponraj

The distributed computations are widely used in the modern world for processing large scale jobs. Hadoop framework which is based on Google MapReduce model becomes popular due to its great processing power and ease to use. However, due to lack of load management, especially in a heterogeneous computing environment, the performance of Hadoop framework may be deteriorated. Therefore this paper presents a load balancing algorithm which aims to balance the load among heterogeneous nodes. And also, the Hadoop simulator HSim is involved to evaluate the performance of the load balancing algorithm. The results indicate that the performance of the cluster has been significantly enhanced.

fuzzy systems and knowledge discovery | 2012

A distributed SVM ensemble for image classification and annotation

Nasullah Khalid Alham; Maozhen Li; Yang Liu; Mahesh Ponraj; Man Qi

Combination of classifiers leads to a substantial reduction of classification errors in a wide range of applications. Among them SVM ensembles with bagging have shown better performance in classification than a single SVM. However, the training process of SVM ensembles is notably computationally intensive especially when the number of replicated training datasets is large. This paper presents MRESVM, a MapReduce based distributed SVM ensemble algorithm for image annotation which re-samples the training dataset based on bootstrapping and trains SVM on each dataset in parallel using a cluster of computers. MRESVM is evaluated in a experimental environment and the results show that the MRESVM algorithm reduces the training time significantly while achieves high level of accuracy in classifications.

Explore More