Suhel Hammoud | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suhel Hammoud is active.

Explore More

Publication

Featured researches published by Suhel Hammoud.

fuzzy systems and knowledge discovery | 2010

MRSim: A discrete event based MapReduce simulator

Suhel Hammoud; Maozhen Li; Yang Liu; Nasullah Khalid Alham; Zelong Liu

Recently MapReduce programming model is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use. The Hadoop implementation of MapReduce is one of the most popular tools for many programmers due its ability to hide details of parallel programming from the users. However work on simulating the Hadoop environment is still in its infancy. Although there are a large number of simulating tools available to simulate distributed environments. However there are only a few simulators available which specifically targets the MapReduce environment. Based on testing we performed; the usability of these simulators is not satisfactory due to the simplified design which limits simulating jobs with variance configurations. We have designed and implemented a MapReduce simulator based on discrete event simulation called MRSim which accurately simulate the Hadoop environment. The simulator on one hand allows us to measure scalability of MapReduce based applications easily and quickly, on the other hand captures the effects of different configurations of Hadoop setup on MapReduce based applications behavior in terms of Hadoop job completion times and hardware utilization.

Future Generation Computer Systems | 2013

HSim: A MapReduce simulator in enabling Cloud Computing

Yang Liu; Maozhen Li; Nasullah Khalid Alham; Suhel Hammoud

Abstract MapReduce is an enabling technology in support of Cloud Computing. Hadoop which is a MapReduce implementation has been widely used in developing MapReduce applications. This paper presents HSim, a MapReduce simulator which builds on top of Hadoop. HSim models a large number of parameters that can affect the behaviors of MapReduce nodes, and thus it can be used to tune the performance of a MapReduce cluster. HSim is validated with both benchmark results and user customized MapReduce applications.

Expert Systems With Applications | 2006

Improving rule sorting predictive accuracy and training time in associative classification

Fadi Abdeljaber Thabtah; Peter I. Cowling; Suhel Hammoud

Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification called associative classification has been proposed. Most associative classification algorithms adopt the exhaustive search method presented in the famous Apriori algorithm to discover the rules and require multiple passes over the database. Furthermore, they find frequent items in one phase and generate the rules in a separate phase consuming more resources such as storage and processing time. In this paper, a new associative classification method called Multi-class Classification based on Association Rules (MCAR) is presented. MCAR takes advantage of vertical format representation and uses an efficient technique for discovering frequent items based on recursively intersecting the frequent items of size n to find potential frequent items of size n+1. Moreover, since rule ranking plays an important role in classification and the majority of the current associative classifiers like CBA and CMAR select rules mainly in terms of their confidence levels. MCAR aims to improve upon CBA and CMAR approaches by adding a more tie breaking constraints in order to limit random selection. Finally we show that shuffling the training data objects before mining can impact substantially the prediction power of some well known associative classification techniques. After experimentation with 20 different data sets, the results indicate that the proposed algorithm is highly competitive in term of an error rate and efficiency if compared with decision trees, rule induction methods and other popular associative classification methods. Finally, we show the effectiveness of MCAR rule sorting method on the quality of the produced classifiers for 12 highly dense benchmark problems.

Computers & Mathematics With Applications | 2011

A MapReduce-based distributed SVM algorithm for automatic image annotation

Nasullah Khalid Alham; Maozhen Li; Yang Liu; Suhel Hammoud

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) have been used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. This paper presents MRSMO, a MapReduce based distributed SVM algorithm for automatic image annotation. The performance of the MRSMO algorithm is evaluated in an experimental environment. By partitioning the training dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computers, the MRSMO algorithm reduces the training time significantly while maintaining a high level of accuracy in both binary and multiclass classifications.

fuzzy systems and knowledge discovery | 2010

A MapReduce based distributed LSI

Yang Liu; Maozhen Li; Suhel Hammoud; Nasullah Khalid Alham; Mahesh Ponraj

Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSIs performance in terms of speed

fuzzy systems and knowledge discovery | 2009

Evaluating Machine Learning Techniques for Automatic Image Annotations

Nasullah Khalid Alham; Maozhen Li; Suhel Hammoud; Hao Qi

The past decade has seen a rapid development in Content Based Image Retrieval (CBIR). CBIR is the retrieval of images based on their low level features such as color, texture, shape etc. To improve the retrieval accuracy, the research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the ‘semantic gap’ between the visual features and the richness of human semantics. Image annotation techniques have been proposed to facilitate CBIR. This paper evaluates 7 representative machine learning techniques for automatic image annotations using 5000 images. An image annotation prototype is implemented and the evaluation results are presented and analyzed.

fuzzy systems and knowledge discovery | 2010

A distributed SVM for image annotation

Nasullah Khalid Alham; Maozhen Li; Suhel Hammoud; Yang Liu; Mahesh Ponraj

The popularity of SVMs has grown tremendously in the last few years for many different classification problems due to its generalization properties, however training SVMs require high computational power. Platts SMO is one the fastest algorithm for training support vector machines, which takes the decomposition technique to the extreme by selecting a set of only two points as the working set then solving them analytically. However SMO becomes slow for large size training data set. In this paper we present a MapReduce based distributed implementation of SMO using Hadoop. The distributed SMO uses multiple core processors to process the training data. By partitioning the training data set into smaller subsets and allocating each of the partitioned subsets to a single Map task, each Map task optimizes the partition in parallel and finally the reducer combine the results. Experiments show the efficiency of the distributed SMO increases with the increase of the number of processors, the training speed of distributed SMO with 12 Map task is about 11times higher than standalone SMO. There is no significant difference in accuracy between distributed and standalone SMO.

fuzzy systems and knowledge discovery | 2011

Parallelizing multiclass Support Vector Machines for scalable image annotation

Nasullah Khalid Alham; Maozhen Li; Yang Liu; Suhel Hammoud

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them, Support Vector Machines (SVMs) are used extensively due to their generalization properties. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. Notably, SVM training is a computationally intensive process especially when the training dataset is large. This paper presents a resource aware parallel multiclass SVM algorithm (named RAMSMO) for large-scale image annotation which partitions the training dataset into smaller binary chunks and optimizes SVM training in parallel using a cluster of computers. A genetic algorithm-based load balancing scheme is designed to optimize the performance of RAMSMO in balancing the computation of multiclass data chunks in heterogeneous computing environments. RAMSMO is evaluated in both experimental and simulation environments, and the results show that it reduces the training time significantly while maintaining a high level of accuracy in classifications.

Parallel Processing Letters | 2015

Parallel Associative Classification Data Mining Frameworks Based MapReduce

Fadi Thabtah; Suhel Hammoud; Hussein Abdel-jaber

Associative classification (AC) is a research topic that integrates association rules with classification in data mining to build classifiers. After dissemination of the Classification-based Association Rule algorithm (CBA), the majority of its successors have been developed to improve either CBAs prediction accuracy or the search for frequent ruleitems in the rule discovery step. Both of these steps require high demands in processing time and memory especially in cases of large training data sets or a low minimum support threshold value. In this paper, we overcome the problem of mining large training data sets by proposing a new learning method that repeatedly transforms data between line and item spaces to quickly discover frequent ruleitems, generate rules, subsequently rank and prune rules. This new learning method has been implemented in a parallel Map-Reduce (MR) algorithm called MRMCAR which can be considered the first parallel AC algorithm in the literature. The new learning method can be utilised in the different steps within any AC or association rule mining algorithms which scales well if contrasted with current horizontal or vertical methods. Two versions of the learning method (Weka, Hadoop) have been implemented and a number of experiments against different data sets have been conducted. The ground bases of the comparisons are classification accuracy and time required by the algorithm for data initialization, frequent ruleitems discovery, rule generation and rule pruning. The results reveal that MRMCAR is superior to both current AC mining algorithms and rule based classification algorithms in improving the classification performance with respect to accuracy.

fuzzy systems and knowledge discovery | 2011

Load balancing in MapReduce environments for data intensive applications

Yang Liu; Maozhen Li; Nasullah Khalid Alham; Suhel Hammoud; Mahesh Ponraj

The distributed computations are widely used in the modern world for processing large scale jobs. Hadoop framework which is based on Google MapReduce model becomes popular due to its great processing power and ease to use. However, due to lack of load management, especially in a heterogeneous computing environment, the performance of Hadoop framework may be deteriorated. Therefore this paper presents a load balancing algorithm which aims to balance the load among heterogeneous nodes. And also, the Hadoop simulator HSim is involved to evaluate the performance of the load balancing algorithm. The results indicate that the performance of the cluster has been significantly enhanced.

Explore More