Nouredine Melab | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nouredine Melab is active.

Explore More

Publication

Featured researches published by Nouredine Melab.

Journal of Parallel and Distributed Computing | 2011

A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems

Mohand-Said Mezmaz; Nouredine Melab; Yacine Kessaci; Young Choon Lee; El-Ghazali Talbi; Albert Y. Zomaya; Daniel Tuyttens

In this paper, we investigate the problem of scheduling precedence-constrained parallel applications on heterogeneous computing systems (HCSs) like cloud computing infrastructures. This kind of application was studied and used in many research works. Most of these works propose algorithms to minimize the completion time (makespan) without paying much attention to energy consumption. We propose a new parallel bi-objective hybrid genetic algorithm that takes into account, not only makespan, but also energy consumption. We particularly focus on the island parallel model and the multi-start parallel model. Our new method is based on dynamic voltage scaling (DVS) to minimize energy consumption. In terms of energy consumption, the obtained results show that our approach outperforms previous scheduling methods by a significant margin. In terms of completion time, the obtained schedules are also shorter than those of other algorithms. Furthermore, our study demonstrates the potential of DVS.

IEEE Transactions on Computers | 2013

GPU Computing for Parallel Local Search Metaheuristic Algorithms

Nouredine Melab; El-Ghazali Talbi

Local search metaheuristics (LSMs) are efficient methods for solving complex problems in science and industry. They allow significantly to reduce the size of the search space to be explored and the search time. Nevertheless, the resolution time remains prohibitive when dealing with large problem instances. Therefore, the use of GPU-based massively parallel computing is a major complementary way to speed up the search. However, GPU computing for LSMs is rarely investigated in the literature. In this paper, we introduce a new guideline for the design and implementation of effective LSMs on GPU. Very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of neighboring solutions to GPU threads, and memory management. These approaches have been experimented using four well-known combinatorial and continuous optimization problems and four GPU configurations. Compared to a CPU-based execution, accelerations up to \times 80 are reported for the large combinatorial problems and up to \times 240 for a continuous problem. Finally, extensive experiments demonstrate the strong potential of GPU-based LSMs compared to cluster or grid-based parallel architectures.

Journal of Parallel and Distributed Computing | 2006

Grid computing for parallel bioinspired algorithms

Nouredine Melab; Sébastien Cahon; El-Ghazali Talbi

This paper focuses on solving large size combinatorial optimization problems using a Grid-enabled framework called ParadisEO-CMW (Parallel and Distributed EO on top on Condor and the Master Worker Framework). The latter is an extension of ParadisEO, an open source framework originally intended to the design and deployment of parallel hybrid meta-heuristics on dedicated clusters and networks of workstations. Relying on the Condor-MW framework, it enables the execution of these applications on volatile heterogeneous computational pools of resources. The motivations, architecture and main features will be discussed. The framework has been experimented on a real-world problem: feature selection in near-infrared spectroscopic data mining. It has been solved by deploying a multi-level parallel model of evolutionary algorithms. Experimentations have been carried out on more than 100 PCs originally intended for education. The obtained results are convincing, both in terms of flexibility and easiness at implementation, and in terms of efficiency, quality and robustness of the provided solutions at run time.

international parallel and distributed processing symposium | 2007

A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems

Mohand-Said Mezmaz; Nouredine Melab; El-Ghazali Talbi

Solving optimally large instances of combinatorial optimization problems requires a huge amount of computational resources. In this paper, we propose an adaptation of the parallel branch and bound algorithm for computational grids. Such gridification is based on new ways to efficiently deal with some crucial issues, mainly dynamic adaptive load balancing, fault tolerance, global information sharing and termination detection of the algorithm. A new efficient coding of the work units (search sub-trees) distributed during the exploration of the search tree is proposed to optimize the involved communications. The algorithm has been implemented following a large scale idle time stealing paradigm (Farmer-Worker). It has been experimented on a flow-shop problem instance (Ta056) that has never been optimally solved. The new algorithm allowed to realize a success story as the optimal solution has been found with proof of optimality, within 25 days using about 1900 processors belonging to 9 Nation-wide distinct clusters (administration domains). During the resolution, the worker processors were exploited with an average of 97% while the farmer processor was exploited only 1.7% of the time. These two rates are good indicators on the efficiency of the proposed approach and its scalability.

genetic and evolutionary computation conference | 2010

GPU-based island model for evolutionary algorithms

Nouredine Melab; El-Ghazali Talbi

The island model for evolutionary algorithms allows to delay the global convergence of the evolution process and encourage diversity. However, solving large size and time-intensive combinatorial optimization problems with the island model requires a large amount of computational resources. GPU computing is recently revealed as a powerful way to harness these resources. In this paper, we focus on the parallel island model on GPU. We address its re-design, implementation, and associated issues related to the GPU execution context. The preliminary results demonstrate the effectiveness of the proposed approaches and their capabilities to fully exploit the GPU architecture.

Future Generation Computer Systems | 2007

A parallel hybrid genetic algorithm for protein structure prediction on the computational grid

Alexandru-Adrian Tantar; Nouredine Melab; El-Ghazali Talbi; Benjamin Parent; Dragos Horvath

Solving the structure prediction problem for complex proteins is difficult and computationally expensive. In this paper, we propose a bicriterion parallel hybrid genetic algorithm (GA) in order to efficiently deal with the problem using the computational grid. The use of a near-optimal metaheuristic, such as a GA, allows a significant reduction in the number of explored potential structures. However, the complexity of the problem remains prohibitive as far as large proteins are concerned, making the use of parallel computing on the computational grid essential for its efficient resolution. A conjugated gradient-based Hill Climbing local search is combined with the GA in order to intensify the search in the neighborhood of its provided configurations. In this paper we consider two molecular complexes: the tryptophan-cage protein (Brookhaven Protein Data Bank ID 1L2Y) and @a-cyclodextrin. The experimentation results obtained on a computational grid show the effectiveness of the approach.

international conference on cluster computing | 2013

A Pareto-based metaheuristic for scheduling HPC applications on a geographically distributed cloud federation

Yacine Kessaci; Nouredine Melab; El-Ghazali Talbi

Reducing energy consumption is an increasingly important issue in cloud computing, more specifically when dealing with High Performance Computing (HPC). Minimizing energy consumption can significantly reduce the amount of energy bills and then increase the provider’s profit. In addition, the reduction of energy decreases greenhouse gas emissions. Therefore, many researches are carried out to develop new methods in order to make HPC applications consuming less energy. In this paper, we present a multi-objective genetic algorithm (MO-GA) that optimizes the energy consumption, CO2 emissions and the generated profit of a geographically distributed cloud computing infrastructure. We also propose a greedy heuristic that aims to maximize the number of scheduled applications in order to compare it with the MO-GA. The two approaches have been experimented using realistic workload traces from Feitelson’s PWA Parallel Workload Archive. The results show that MO-GA outperforms the greedy heuristic by a significant margin in terms of energy consumption and CO2 emissions. In addition, MO-GA is also proved to be slightly better in terms of profit while scheduling more applications.

Concurrency and Computation: Practice and Experience | 2013

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Imen Chakroun; Mohand-Said Mezmaz; Nouredine Melab; Ahcène Bendjoudi

In this paper, we address the design and implementation of graphical processing unit (GPU)‐accelerated branch‐and‐bound algorithms (B&B) for solving flow‐shop scheduling optimization problems (FSP). Such applications are CPU‐time consuming and highly irregular. On the other hand, GPUs are massively multithreaded accelerators using the single instruction multiple data model at execution. A major issue that arises when executing on GPU, a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP that contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well‐known FSP benchmarks using an Nvidia Tesla (C2050 GPU card (http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf)). Compared with a CPU‐based execution, accelerations up to × 77.46 are achieved for large problem instances. Copyright

soft computing | 2008

A grid-based genetic algorithm combined with an adaptive simulated annealing for protein structure prediction

Alexandru-Adrian Tantar; Nouredine Melab; El-Ghazali Talbi

A hierarchical hybrid model of parallel metaheuristics is proposed, combining an evolutionary algorithm and an adaptive simulated annealing. The algorithms are executed inside a grid environment with different parallelization strategies: the synchronous multi-start model, parallel evaluation of different solutions and an insular model with asynchronous migrations. Furthermore, a conjugated gradient local search method is employed at different stages of the exploration process. The algorithms were evaluated using the protein structure prediction problem, having as benchmarks the tryptophan-cage protein (Brookhaven Protein Data Bank ID: 1L2Y), the tryptophan-zipper protein (PDB ID: 1LE1) and the α-Cyclodextrin complex. Experimentations were performed on a nation-wide grid infrastructure, over six distinct administrative domains and gathering nearly 1,000 CPUs. The complexity of the protein structure prediction problem remains prohibitive as far as large proteins are concerned, making the use of parallel computing on the computational grid essential for its efficient resolution.

The Journal of Supercomputing | 2000

A Parallel Adaptive Gauss-Jordan Algorithm

Nouredine Melab; El-Ghazali Talbi; Serge G. Petiton

This paper presents a parallel adaptive version of the block-based Gauss-Jordan algorithm, utilized to invert large matrices. This version includes a characterization of the workload and a mechanism of its folding/unfolding. Furthermore, this paper proposes a work scheduling strategy and an application-oriented solution for the fault tolerance problem. The application is implemented and experimented with MARS1 in dedicated and non-dedicated environments. The results show that an absolute efficiency of 92% is possible on a cluster of DEC/ALPHA processors interconnected by a Gigaswitch network and an absolute efficiency of 67% can be obtained on an Ethernet network of SUN-Sparc 4 workstations. Moreover, the algorithm is tested on a meta-system including both the two parks of machines. Finally, an out-of-core solution for the algorithm is proposed. This solution allows a gain of 66% of data input operations and reduces the central memory space required for storing the data space of the algorithm by a factor q, where q is the dimension of the matrix to be inverted in terms of data blocks.

Explore More