Mohamed Wahib
Hokkaido University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohamed Wahib.
Genetic Programming and Evolvable Machines | 2009
Asim Munawar; Mohamed Wahib; Masaharu Munetomo; Kiyoshi Akama
General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallel computing that promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges and design choices involved in parallelizing a hybrid of Genetic Algorithm (GA) and Local Search (LS) to solve MAXimum SATisfiability (MAX-SAT) problem on a state-of-the-art nVidia Tesla GPU using nVidia Compute Unified Device Architecture (CUDA). MAX-SAT is a problem of practical importance and is often solved by employing metaheuristics based search methods like GAs and hybrid of GA with LS. Almost all the parallel GAs (pGAs) designed in the last two decades were designed for either clusters or MPPs. Unfortunately, very little research is done on the implementation of such algorithms over commodity graphics hardware. GAs in their simple form are not suitable for implementation over the Single Instruction Multiple Thread (SIMT) architecture of a GPU, and the same is the case with conventional LS algorithms. In this paper we explore different genetic operators that can be used for an efficient implementation of GAs over nVidia GPUs. We also design and introduce new techniques/operators for an efficient implementation of GAs and LS over such architectures. We use nVidia Tesla C1060 to perform several numerical tests and performance measurements and show that in the best case we obtain a speedup of 25×. We also discuss the effects of different optimization techniques on the overall execution time.
high performance computing and communications | 2008
Asim Munawar; Mohamed Wahib; Masaharu Munetomo; Kiyoshi Akama
This paper gives a survey about the impact of modern parallel/distributed computing paradigms over parallel genetic algorithms (PGAs). Helping the GA community to feel more comfortable with the evolving parallel paradigms, and marking some areas of research for the high-performance computing (HPC) community is the major inspiration behind this survey. In the modern parallel computing paradigms we have considered only two major areas that have evolved very quickly during the past few years, namely, multicore computing and Grid computing. We discuss the challenges involved, and give potential solutions for these challenges. We also propose a hierarchical PGA suitable for Grid environment with multicore computational resources.
congress on evolutionary computation | 2011
Asim Munawar; Mohamed Wahib; Masaharu Munetomo; Kiyoshi Akama
In this paper we propose a many-core implementation of evolutionary computation for GPGPU (General-Purpose Graphic Processing Unit) to solve non-convex Mixed Integer Non-Linear Programming (MINLP) and non-convex Non Linear Programming (NLP) problems using a stochastic algorithm. Stochastic algorithms being random in their behavior are difficult to implement over GPU like architectures. In this paper we not only succeed in implementation of a stochastic algorithm over GPU but show considerable speedups over CPU implementations. The stochastic algorithm considered for this paper is an adaptive resolution approach to genetic algorithm (arGA), developed by the authors of this paper. The technique uses the entropy measure of each variable to adjust the intensity of the genetic search around promising individuals. Performance is further improved by hybridization with adaptive resolution local search (arLS) operator. In this paper, we describe the challenges and design choices involved in parallelization of this algorithm to solve complex MINLPs over a commodity GPU using Compute Unified Device Architecture (CUDA) programming model. Results section shows several numerical tests and performance measurements obtained by running the algorithm over an nVidia Fermi GPU. We show that for difficult problems we can obtain a speedup of up to 20× with double precision and up to 42× with single precision.
parallel and distributed computing: applications and technologies | 2009
Asim Munawar; Mohamed Wahib; Masaharu Munetomo; Kiyoshi Akama
General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallel computing that promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges and design choices involved in parallelization of Bayesian Optimization Algorithm (BOA) to solve complex combinatorial optimization problems over nVidia commodity graphics hardware using Compute Unified Device Architecture (CUDA). BOA is a well-known multivariate Estimation of Distribution Algorithm (EDA) that incorporates methods for learning Bayesian Network (BN). It then uses BN to sample new promising solutions. Our implementation is fully compatible with modern commodity GPUs and therefore we call it gBOA (BOA on GPU). In the results section, we show several numerical tests and performance measurements obtained by running gBOA over an nVidia Tesla C1060 GPU. We show that in the best case we can obtain a speedup of up to 13x.
congress on evolutionary computation | 2011
Mohamed Wahib; Asim Munawar; Masaharu Munetomo; Kiyoshi Akama
Led by General Purpose computing over Graphical Processing Units (GPGPUs), the parallel computing area is witnessing a rapid change in dominant parallel systems. A major hurdle in this switch is the Single Instruction Multiple Thread (SIMT) architecture of GPUs which is usually not suitable for the design of legacy parallel algorithms. Genetic Algorithms (GAs) is no exception for that. GAs are commonly parallelized due to the high demanding computational needs. Given the performance of GPGPUs, the need to best exploit them to maximize computing efficiency for parallel GAs is demandingly growing. The goal of this paper is to shed light on the challenges parallel GAs designers/programmers will likely face while trying to achieve this, and to provide some practical advice on how to maximize GPGPU exploitation as a result. To that end, this paper provides a study on adapting legacy parallel GAs on GPGPU systems. The paper exposes the design challenges of nVidias GPU architecture to the parallel GAs community by: discussing features of GPU, reviewing design issues in GPU relevant to parallel GAs, the design and introduction of new techniques to achieve an efficient implementation for parallel GAs and observing the effect of the pivotal points that both capitalize on the strengths of GPU and limit the deficiencies/overheads of GPUs. The paper demonstrates the performance of designed-for-GPGPU parallel GAs representing the entire spectrum of legacy parallel model of GAs over nVidia Tesla C1060 workstation showing a significant improvement in performance after optimizing and tuning the algorithms for GPU.
ieee international conference on services computing | 2008
Mohamed Wahib; Asim Munawar; Masaharu Munetomo; Akama Kiyoshi
Metaheuristics grid (MHGrid) is a service oriented grid application that enables the user to solve almost any global optimization problem using metaheuristics techniques. Two problems potentially limit the generality of MHGrid over the problem type space; having a fixed set of solvers and lacking the solver-problem relation semantics. The set of strategies enforced to resolve these two problems are: offering the solvers as services, enabling the user to define his parallelization model, allowing the user to add his own service and maintaining service-based functionalities on both the middleware layer and application layer. This paper explains the design, architecture and implementation of the SOA that MHGrid endorses that would allow the enforcing of the resolving strategies.
parallel and distributed computing: applications and technologies | 2007
Mohamed Wahib; Masaharu Munetomo; Asim Munawar; Kiyoshi Akama
This paper introduces MHGrid, a framework that exploits meta-heuristics based search methods and grid computing to enable the transparent sharing of heterogeneous and dynamic resources offering a grid based global optimization framework. MHGrid allows a user to solve almost all kinds of global optimization problems in a black box manner with a minimal input from the user, it also allows the user to integrate his own solver into MHGrid. In this paper we will discuss the architecture and motivation of such a system. We will also discuss the challenges/complexities involved in constructing MHGrid.The Chip Multiprocessor (CMP) architecture offers parallel multi-thread execution and fast retrieval of shared data that is cached on-chip. In order to obtain the best possible performance with the CMP architecture, the cache architecture must be optimised to reduce time lost during remote cache and off-chip memory accesses. Many researchers proposed CMP cache architectures to improve the system performance, but they have not considered parallel execution of mixed single-thread and multi-thread workloads. In this paper, we propose a hybrid workload-aware cache architecture SPS2, in which each processor has both private and shared L2 caches. We describe the corresponding SPS2 cache coherence protocol with state transition graph. Performance evaluation demonstrates that the proposed SPS2 cache structure has better performance than traditional private L2 and shared L2 when hybrid workloads are applied.
congress on evolutionary computation | 2010
Mohamed Wahib; Asim Munawar; Masaharu Munetomo; Kiyoshi Akama
A principal fragment-based design approach is De Novo ligand design at which small-molecule structures from a database of existing compounds (or compounds that could be made) are docked into the protein binding site following a virtual synthesis scheme. New virtual structures can easily be constructed from combinatorial building blocks. Typically, tens of thousands of orientations are generated for each ligand candidate, therefore global optimization algorithms are usually employed to search the chemical space by generating new molecular structures through probing many different fragments in a combinatorial fashion. We propose using Bayesian Optimization Algorithm (BOA), a meta-heuristic algorithm, in searching the combination of pre-docked fragments through minimizing the energy of ligand-receptor docking. We further introduce the use of GPU (Graphical Processing Unit) to overcome the very long time required in evaluating each possible fragment combination. We show how the GPU utilization enables experimenting larger fragments and target receptors for more complex instances. The experiments resulted in regenerating three drug-like compounds defined in the ZINC database as well as finding a new compound. The Results show how the nVidias Tesla C1060 GPU was utilized to accelerate the docking process by two orders of magnitude.
world congress on services | 2011
Mohamed Wahib; Asim Munawar; Masaharu Munetomo; Kiyoshi Akama
Cloud computing is impacting the modern Internet computing and businesses in every aspect. One feature of clouds is the convenience of using the services offered by the cloud. Consequently, most cloud service providers use WS for users and developers to interface with the cloud. However, the current cloud WS are focused into core and fundamental modern computing functionalities. We anticipate as cloud developments tools mature and cloud applications become more popular, there will be an opportunity for designing and implementing applications/services to be embedded in the cloud for use by applications in the cloud. We propose a framework for WS deployment in the cloud to be usable by applications residing in the same cloud. The framework capitalizes on the cloud strong points to offer a higher value to the service consumer inside the cloud. The authoritative nature of clouds would enable more efficient models for WS publishing, indexing and description. Moreover, being hosted in the cloud, WScan build on the high scalability offered by the cloud with a much higher reliability. Finally, scheduling the instances using the WS in bundle with the WS instances could offer a LAN-like connectivity performance driving down the latency to the magnitude of lower microseconds. In this paper, we highlight the challenges and opportunities of cloud applications using cloud embedded Web services. We give a description of the different aspects by illustrating the different components, together with an end-to-end use case to show the applicability of the proposed system.
learning and intelligent optimization | 2011
Asim Munawar; Mohamed Wahib; Masaharu Munetomo; Kiyoshi Akama
Non convex mixed integer non-linear programming problems (MINLPs) are the most general form of global optimization problems. Such problems involve both discrete and continuous variables with several active non-linear equality and inequality constraints. In this paper, a new approach for solving MINLPs is presented using adaptive resolution based micro genetic algorithms with local search. Niching is incorporated in the algorithm by using a technique inspired from the tabu search algorithm. The proposed algorithm adaptively controls the intensity of the genetic search in a given sub-solution space, i.e. promising regions are searched more intensely as compared to other regions. The algorithm reduces the chances of convergence to a local minimum by maintaining a list of already visited minima and penalizing their neighborhoods. This technique is inspired from the tabu list strategy used in the tabu search algorithm. The proposed technique was able to find the best-known solutions to extremely difficult MINLP/NLP problems in a competitive amount of time. The results section discusses the performance of the algorithm and the effect of different operators by using a variety of MINLP/NLPs from different problem domains.