Ahcène Bendjoudi
University of Lorraine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahcène Bendjoudi.
Concurrency and Computation: Practice and Experience | 2013
Imen Chakroun; Mohand-Said Mezmaz; Nouredine Melab; Ahcène Bendjoudi
In this paper, we address the design and implementation of graphical processing unit (GPU)‐accelerated branch‐and‐bound algorithms (B&B) for solving flow‐shop scheduling optimization problems (FSP). Such applications are CPU‐time consuming and highly irregular. On the other hand, GPUs are massively multithreaded accelerators using the single instruction multiple data model at execution. A major issue that arises when executing on GPU, a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP that contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well‐known FSP benchmarks using an Nvidia Tesla (C2050 GPU card (http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf)). Compared with a CPU‐based execution, accelerations up to × 77.46 are achieved for large problem instances. Copyright
The Journal of Supercomputing | 2015
Youcef Djenouri; Ahcène Bendjoudi; Malika Mehdi; Nadia Nouali-Taboudjemat; Zineb Habbas
Association rules mining (ARM) is a well-known combinatorial optimization problem aiming at extracting relevant rules from given large-scale datasets. According to the state of the art, the bio-inspired methods proved their efficiency by generating acceptable solutions in a reasonable time when dealing with small and medium size instances. Unfortunately, to cope with large instances such as the webdocs benchmark, these methods require more and more powerful processors and are time expensive. Nowadays, computing power is no longer a real issue. It can be provided by the power of emerging technologies such as graphics processing units (GPUs) that are massively multi-threaded processors. In this paper, we investigate the use of GPUs to speed up the computation. We propose two GPU-based bees swarm algorithms for association rules mining (single evaluation in GPU, SE-GPU and multiple evaluation in GPU, ME-GPU). SE-GPU aims at evaluating one rule at a time where each thread is associated with one transaction, whereas ME-GPU evaluates multiple rules in parallel on GPU where each thread is associated with several transactions. To validate our approaches, the two algorithms have been executed to solve well-known large ARM instances. Real experiments have been carried out on an Intel Xeon 64 bit quad-core processor E5520 coupled to an Nvidia Tesla C2075 GPU device. The results show that our approaches improve the execution time up to 100
Concurrency and Computation: Practice and Experience | 2014
Nouredine Melab; Imen Chakroun; Ahcène Bendjoudi
Concurrency and Computation: Practice and Experience | 2017
Youcef Djenouri; Ahcène Bendjoudi; Zineb Habbas; Malika Mehdi; Djamel Djenouri
\times
international conference on parallel processing | 2015
Youcef Djenouri; Ahcène Bendjoudi; Djamel Djenouri; Zineb Habbas
cluster computing and the grid | 2007
Ahcène Bendjoudi; Nouredine Melab; El-Ghazali Talbi
× over the sequential mono-core bees swarm optimization-ARM algorithm. Moreover, the proposed approaches have been compared with CPU multi-core ones (1–8 cores). The results show that they are faster than the multi-core versions whatever is the number of used cores.
soft computing and pattern recognition | 2014
Youcef Djenouri; Ahcène Bendjoudi; Malika Mehdi; Nadia Nouali-Taboudjemat; Zineb Habbas
Branch‐and‐bound (B&B) algorithms are attractive methods for solving to optimality combinatorial optimization problems using an implicit enumeration of a dynamically built tree‐based search space. Nevertheless, they are time‐consuming when dealing with large problem instances. Therefore, pruning tree nodes (subproblems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Pruning requires to perform the bounding operation, which consists of applying a lower bound function to the subproblems generated during the exploration process. Preliminary experiments performed on the Flow‐Shop scheduling problem (FSP) have shown that the bounding operation consumes over 98% of the execution time of the B&B algorithm. In this paper, we investigate the use of graphics processing unit (GPU) computing as a major complementary way to speed up the search. We revisit the design and implementation of the parallel bounding model on GPU accelerators. The proposed approach enables data access optimization. Extensive experiments have been carried out on well‐known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU‐based single core execution using an Intel Core i7‐970 processor without GPU, speedups higher than 100 times faster are achieved for large problem instances. At an equivalent peak performance, GPU‐accelerated B&B is twice faster than its multi‐core counterpart. Copyright
international conference on cluster computing | 2012
Trong-Tuan Vu; Bilel Derbel; Asim Ali; Ahcène Bendjoudi; Nouredine Melab
The association rules mining (ARM) problem is one of the most important problems in the area of data mining. It aims at finding all relevant association rules from transactional databases. It is CPU time intensive and requires a huge computing power when dealing with large transactional databases. To deal with this issue, Graphics Processing Units (GPUs) are a powerful tool to speed up the search process. However, their performance is closely subject to thread/branch divergence resulting from the single instruction multiple data parallel model of GPUs. In this paper, we propose three approaches based on database reorganization, aiming to reduce thread divergence in GPU‐based bees swarm optimization metaheuristic for ARM, respectively, named block‐based reordering, transactions‐based reordering, and transactions‐based reordering with median value. Theoretical and experimental studies have been carried out using well‐known large ARM instances. The experiments have been performed on an Intel Xeon 64 bit quad‐core processor E5520 coupled to Nvidia Tesla C2075 448 cores. The results show that the proposed approaches minimize considerably the number of thread divergence and improve the overall execution time. Indeed, the number of thread divergence occurrences has been reduced by up to eight times making the execution much faster. Copyright
parallel, distributed and network-based processing | 2017
Youcef Djenouri; Ahcène Bendjoudi; Djamel Djenouri; Marco Comuzzi
The extraction of association rules from large transactional databases is considered in the paper using cluster architecture parallel computing. Motivated by both the successful sequential BSO-ARM algorithm, and the strong matching between this algorithm and the structure of the cluster architectures, we present in this paper a new parallel ARM algorithm that we call MW-BSO-ARM for master/worker version of BSO-ARM. The goal is to deal with large databases by minimizing the communication and synchronization costs, which represent the main challenges that faces any cluster architecture. The experimental results are very promising and show clear improvement that reaches \(300\,\%\) for large instances. For examples, in big transactional database such as WebDocs, the proposed approach generates \(10^{7}\) satisfied rules in only 22 min, while a previous GPU-based approach cannot generate more than \(10^{3}\) satisfied rules into 10 h. The results also reveal that MW-BSO-ARM outperforms the PGARM cluster-based approach in terms of computation time.
BIC-TA | 2014
Youcef Djenouri; Ahcène Bendjoudi; Nadia Nouali-Taboudjemat; Zineb Habbas
Solving exactly Combinatorial Optimization Problems (COPs) using a Branch-and-Bound algorithm requires a huge amount of computational resources. The efficiency of such algorithm can be improved by distributing at large scale the computation required by the exploration of the search tree. In this paper, we propose ParallelBB, which is a P2P-based parallelization of the Branch-and-Bound algorithm for the computational Grid. The algorithm has been implemented using the ProActive distributed object Grid middleware. The algorithm has been applied to a mono- criterion permutation flow-shop problem and promisingly experimented on the Grid5000 computational Grid.