Md. Mohsin Ali | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Md. Mohsin Ali is active.

Explore More

Publication

Featured researches published by Md. Mohsin Ali.

international parallel and distributed processing symposium | 2014

Application Level Fault Recovery: Using Fault-Tolerant Open MPI in a PDE Solver

Md. Mohsin Ali; James Southern; Peter E. Strazdins; Brendan Harding

A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft User Level Failure Mitigation (ULFM) proposal of the MPI Forums Fault Tolerance Working Group, is used to create fault-tolerant applications. This allows applications and libraries to design their own recovery methods and control them at the user level. However, only a limited amount of research work on user level failure recovery (including the implementation and performance evaluation of this prototype) has been carried out. This paper contributes a fault-tolerant implementation of an application solving 2D partial differential equations (PDEs) by means of a sparse grid combination technique which is capable of surviving multiple process failures caused by the faults. Our fault recovery involves reconstructing the faulty communicators without shrinking the global size by re-spawning failed MPI processes on the same physical processors where they were before the failure (for load balancing). It also involves restoring lost data from either exact check pointed data on disk, approximated data in memory (via an alternate sparse grid combination technique) or a near-exact copy of replicated data in memory. The experimental results show that the faulty communicator reconstruction time is currently large in the draft ULFM, especially for multiple process failures. They also show that the alternate combination technique has the lowest data recovery overhead, except on a system with very low disk write latency for which checkpointing has the lowest overhead. Furthermore, the errors due to the recovery of approximated data are within a factor of 10 in all cases, with the surprising result that the alternate combination technique being more accurate than the near-exact replication method. The contributed implementation details, including the analysis of the experimental results, of this paper will help application developers to resolve different issues of design and implementation of fault-tolerant applications by means of the Open MPI ULFM standard.

international conference on conceptual structures | 2013

Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce

Jay Walter Larson; Markus Hegland; Brendan Harding; Stephen Roberts; Linda Stals; Alistair P. Rendell; Peter E. Strazdins; Md. Mohsin Ali; Christoph Kowitz; Ross Nobes; James Southern; Nicholas Wilson; Michael Li; Yasuyuki Oishi

Abstract A key issue confronting petascale and exascale computing is the growth in probability of soft and hard faults with increasing system size. A promising approach to this problem is the use of algorithms that are inherently fault tolerant. We introduce such an algorithm for the solution of partial differential equations, based on the sparse grid approach. Here, the solution of multiple component grids are efficiently combined to achieve a solution on a full grid. The technique also lends itself to a (modified) MapReduce framework on a cluster of processors, with the map stage corresponding to allocating each component grid for solution over a subset of the processors, and the reduce stage corresponding to their combination. We describe how the sparse grid combination method can be modified to robustly solve partial differential equations in the presence of faults. This is based on a modified combination formula that can accommodate the loss of one or two component grids. We also discuss accuracy issues associated with this formula. We give details of a prototype implementation within a MapReduce framework using the dynamic process features and asynchronous message passing facilities of MPI. Results on a two-dimensional advection problem show that the errors after the loss of one or two sub-grids are within a factor of 3 of the sparse grid solution in the presence of no faults. They also indicate that the sparse grid technique with four times the resolution has approximately the same error as a full grid, while requiring (for a sufficiently high resolution) much lower computation and memory requirements. We finally outline a MapReduce variant capable of responding to faults in ways other than re-scheduling of failed tasks. We discuss the likely software requirements for such a flexible MapReduce framework, the requirements it will impose on users’ legacy codes, and the systems runtime behavior.

international parallel and distributed processing symposium | 2015

Highly Scalable Algorithms for the Sparse Grid Combination Technique

Peter E. Strazdins; Md. Mohsin Ali; Brendan Harding

Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.

parallel computing | 2014

Managing complexity in the Parallel Sparse Grid Combination Technique

Jay Walter Larson; Peter E. Strazdins; Markus Hegland; Brendan Harding; Stephen Roberts; Linda Stals; Alistair P. Rendell; Md. Mohsin Ali; James Southern

J. W. Larson, P. E. Strazdins, M. Hegland, B. Harding, S. Roberts , L. Stals , A. P. Rendell, Md. M. Ali , and J. Southern

International Journal of High Performance Computing Applications | 2016

Complex scientific applications made fault-tolerant with the sparse grid combination technique

Md. Mohsin Ali; Peter E. Strazdins; Brendan Harding; Markus Hegland

Ultra-large–scale simulations via solving partial differential equations (PDEs) require very large computational systems for their timely solution. Studies shown the rate of failure grows with the system size, and these trends are likely to worsen in future machines. Thus, as systems, and the problems solved on them, continue to grow, the ability to survive failures is becoming a critical aspect of algorithm development. The sparse grid combination technique (SGCT) which is a cost-effective method for solving higher dimensional PDEs can be easily modified to provide algorithm-based fault tolerance. In this article, we describe how the SGCT can produce fault-tolerant versions of the Gyrokinetic Electromagnetic Numerical Experiment plasma application, Taxila Lattice Boltzmann Method application, and Solid Fuel Ignition application. We use an alternate component grid combination formula by adding some redundancy on the SGCT to recover data from lost processes. User-level failure mitigation (ULFM) message passing interface (MPI) is used to recover the processes, and our implementation is robust over multiple failures and recovery (processes and nodes). An acceptable degree of modification of the applications is required. Results using the 2-D SGCT show competitive execution times with acceptable error (within 0.1% to 1.0%), compared to the same simulation with a single full resolution grid. The benefits improve when the 3-D SGCT is used. Experiments show the applications ability to successfully recover from multiple failures, and applying multiple SGCT reduces the computed solution error. Process recovery via ULFM MPI increases from approximately 1.5 sec at 64 cores to approximately 5 sec at 2048 cores for a one-off failure. This compares applications’ built-in checkpointing with job restart in conjunction with the classical SGCT on failure, which have overheads four times as large for a single failure, excluding the recomputation overhead. An analysis for a long-running application considering recomputation times indicates a reduction in overhead of over an order of magnitude.

computer and information technology | 2010

Quantum Evolutionary Algorithm based on Particle Swarm theory in multiobjective problems

Md. Kowsar Hossain; Md. Amjad Hossain; M. M. A. Hashem; Md. Mohsin Ali

Quantum Evolutionary Algorithm (QEA) is an optimization algorithm based on the concept of quantum computing and Particle Swarm Optimization (PSO) algorithm is a population based intelligent search technique. Both these techniques have good performance to solve optimization problems. PSEQEA combines the PSO with QEA to improve the performance of QEA and it can solve single objective optimization problem efficiently and effectively. In this paper, PSEQEA is studied to solve multi-objective Optimization (MO) problems. Some well-known non-trivial functions are used to observe the performance of PSEQEA to detect the Pareto optimal points and the shape of the Pareto front using both Fixed Weighted Aggregation method and Adaptive Weighted Aggregation method. Moreover, Vector Evaluated PSEQEA (VEPSEQEA) borrows concept from Schaffers Vector Evaluated Genetic Algorithm (VEGA) that can also cope with MO problems. Simulation results show that PSEQEA and VEPSEQEA perform better than PSO and VEPSO to discover the Pareto frontier.

Journal of Computational Science | 2016

Design and Analysis of Two Highly Scalable Sparse Grid Combination Algorithms

Peter E. Strazdins; Md. Mohsin Ali; Brendan Harding

Abstract Many large scale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as across the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. We give details of the design and implementation of a ‘partial’ sparse grid data structure, which is needed for its efficient implementation. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. It requires a technique called sub-griding in order to correctly deal with the combination of very small surpluses. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1–2 − d , where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism (70–80%) and a factor of 2 d load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limited regime and that the direct algorithm is scalable with respect to d . Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, at least by a factor of 2 in most cases. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts. Altering the layout of processes in the process grids and the mapping of processes affects the performance of the 2D SGCT by less than 10%, and affects even less the application part of an SGCT advection application.

Journal of Physics: Conference Series | 2012

On the Factor Refinement Principle and its Implementation on Multicore Architectures

Md. Mohsin Ali; Marc Moreno Maza; Yuzhen Xie

We propose a divide and conquer adaptation of the factor refinement algorithm of Bach, Driscoll and Shallit. For an ideal cache of Z words, with L words per block, the original approach suffers from O(n 2 /L) cache misses, meanwhile our adaptation incurs O(n 2 /ZL) cache misses only. We have realized a multithreaded implementation of the latter using Cilk++ targeting multicores. Our code achieves linear speedup on 16 cores for sufficiently large input data.

international conference on computer, control and communication | 2009

A flow transparent multicast pre-reservation modification of RSVP for providing real-time services in wireless mobile networks

Md. Mohsin Ali; Kazi Md. Rokibul Alam; Muhammad Nazrul Islam

Resource Reservation Protocol (RSVP) is used to sufficiently reserve the resources between fixed endpoints. To support it for host mobility, several methods have been proposed in the literature to address the challenging problems of minimizing the handoff resource reservation delays and wastage of resources. Although, these methods minimize intra-subnet handoff resource reservation delays with minimum wastage of resources; but for inter-subnet handoffs, these methods need more resource reservation delays and more wastage of resources. In this paper, we propose a flow transparent multicast pre-reservation modification of RSVP to solve these problems to make the handoffs completely opaque to the user. In this method, a Multicast Group (MG) of cells is selected for each Boundary Cell (BC), and the resources are pre-reserved in this group only to the newly added path instead of whole path between source and destination when the Mobile Node (MN) performs an inter-subnet handoff towards that group. Simulation results demonstrate that the number of cells in the MG and resource reservation delays for inter-subnet handoff of the proposed method is always smaller than that of all other existing methods and has tolerable call blocking probabilities

international parallel and distributed processing symposium | 2016

Application Fault Tolerance for Shrinking Resources via the Sparse Grid Combination Technique

Peter E. Strazdins; Md. Mohsin Ali; Bert J. Debusschere

The need to make large-scale scientific simulations resilient to the shrinking and growing of compute resources arises from exascale computing and adverse operating conditions (fault tolerance). It can also arise from the cloudcomputing context where the cost of these resources can fluctuate. In this paper, we describe how the Sparse Grid Combination Technique can make such applications resilient to shrinking compute resources. The solution of the non-trivial issues of dealing with data redistribution and on-the-fly malleability of process grid information and ULFM MPI communicatorsare described. Results on a 2D advection solver indicate that process recovery time is significantly reduced from the alternate strategy where failed resources are replaced, overall execution time is actually improved from this case and for checkpointing and the execution error remains small, even when multiple failures occur.

Explore More