Ricolindo L. Cariño | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricolindo L. Cariño is active.

Explore More

Publication

Featured researches published by Ricolindo L. Cariño.

The Journal of Supercomputing | 2008

Dynamic load balancing with adaptive factoring methods in scientific applications

Ricolindo L. Cariño; Ioana Banicescu

Abstract To improve the performance of scientific applications with parallel loops, dynamic loop scheduling methods have been proposed. Such methods address performance degradations due to load imbalance caused by predictable phenomena like nonuniform data distribution or algorithmic variance, and unpredictable phenomena such as data access latency or operating system interference. In particular, methods such as factoring, weighted factoring, adaptive weighted factoring, and adaptive factoring have been developed based on a probabilistic analysis of parallel loop iterates with variable running times. These methods have been successfully implemented in a number of applications such as: N-Body and Monte Carlo simulations, computational fluid dynamics, and radar signal processing. The focus of this paper is on adaptive weighted factoring (AWF), a method that was designed for scheduling parallel loops in time-stepping scientific applications. The main contribution of the paper is to relax the time-stepping requirement, a modification that allows the AWF to be used in any application with a parallel loop. The modification further allows the AWF to adapt to load imbalance that may occur during loop execution. Results of experiments to compare the performance of the modified AWF with the performance of the other loop scheduling methods in the context of three nontrivial applications reveal that the performance of the modified method is comparable to, and in some cases, superior to the performance of the most recently introduced adaptive factoring method.

Cluster Computing | 2005

A Load Balancing Tool for Distributed Parallel Loops

Ricolindo L. Cariño; Ioana Banicescu

Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications.

international parallel and distributed processing symposium | 2002

Dynamic scheduling parallel loops with variable iterate execution times

Ricolindo L. Cariño; Ioana Banicescu

To improve performance of scientific applications in parallel and distributed environments, dynamic scheduling algorithms for parallel loops have been proposed. Such algorithms address performance degradations due to load imbalance caused by predictable phenomena like nonuniform data distribution or algorithmic variance, and unpredictable phenomena such as data access latency or operating system interference. In particular, algorithms such as factoring, weighted factoring, adaptive weighted factoring, and adaptive factoring have been developed based on a probabilistic analysis of parallel loop iterates with variable running times. These algorithms execute the iterates in variable size chunks, where the sizes are determined such that the chunks complete before the optimal time with a high probability. These algorithms have successfully been implemented in a number of scientific applications such as: N-Body and Monte Carlo simulations, CFD, and radar signal processing. This paper presents a comparative study of the performance of various loop scheduling algorithms in a message-passing environment. The algorithms have been integrated into a tool for executing parallel loops, and the tool applied in profiling quadrature routines that are often used in scientific computations such as finite element methods, particle physics, and multivariate statistics. Experimental results reveal the effectiveness and robustness of the latest developed scheduling algorithms over the previous ones on loops with irregular iterate execution times.

international symposium on parallel and distributed computing | 2004

A novel dynamic load balancing library for cluster computing

Mahadevan Balasubramaniam; Kevin J. Barker; Ioana Banicescu; Nikos Chrisochoides; Jaderick P. Pabico; Ricolindo L. Cariño

In the last few years, research advances in dynamic scheduling at application and runtime system levels have contributed to improving the performance of scientific applications in heterogeneous environments. This paper presents the design and implementation of a library as a result of an integrated approach to dynamic load balancing. This approach combines the advantages of optimizing data migration via novel dynamic loop scheduling strategies with the advances in object migration mechanisms of parallel runtime systems. The performance improvements obtained by the use of this library have been investigated by its use in two scientific applications: the N-body simulations, and the profiling of automatic quadrature routines. The experimental results obtained underscore the significance of using such an integrated approach, as well as the benefits of using the library especially in cluster applications characterized by irregular and unpredictable behavior.

parallel computing | 2005

Design and implementation of a novel dynamic load balancing library for cluster computing

Ioana Banicescu; Ricolindo L. Cariño; Jaderick P. Pabico; Mahadevan Balasubramaniam

This paper presents the design and implementation of a library based on an integrated approach to dynamic load balancing. This approach combines the advantages of optimizing data migration via novel dynamic loop scheduling strategies with the advances in resource management and task migration capabilities offered by a recently developed parallel runtime system. The performance improvements obtained by the use of this library have been investigated by its use in three scientific applications: the N-body simulations, the profiling of automatic quadrature routines, and the heat solver in an unstructured grid. The experimental results obtained underscore the significance of using such an integrated approach, as well as the benefits of using the library especially in applications characterized by irregular and unpredictable behavior.

international symposium on parallel and distributed computing | 2009

Towards the Robustness of Dynamic Loop Scheduling on Large-Scale Heterogeneous Distributed Systems

Ioana Banicescu; Florina M. Ciorba; Ricolindo L. Cariño

Dynamic loop scheduling (DLS) algorithms provide application-level load balancing of loop iterates, with the goal of maximizing application performance on the underlying system. These methods use run-time information regarding the performance of the applications execution (for which irregularities change over time). Many DLS methods are based on probabilistic analyses, and therefore account for unpredictable variations of application and system related parameters. Scheduling scientific and engineering applications in large-scale distributed systems (possibly shared with other users) makes the problem of DLS even more challenging. Moreover, the chances of failure, such as processor or link failure, are high in such large-scale systems. In this paper, we employ the hierarchical approach for three DLS methods, and propose metrics for quantifying their robustness with respect to variations of two parameters (load and processor failures), for scheduling irregular applications in large-scale heterogeneous distributed systems.

High performance scientific and engineering computing | 2004

Message-passing parallel adaptive quantum trajectory method

Ricolindo L. Cariño; Ioana Banicescu; Ravi K. Vadapalli; Charles A. Weatherford; Jianping Zhu

Time-dependent wavepackets are widely used to model various phenomena in physics. One approach in simulating the wavepacket dynamics is the quantum trajectory method (QTM). Based on the hydrodynamic formulation of quantum mechanics, the QTM represents the wavepacket by an unstructured set of pseudoparticles whose trajectories are coupled by the quantum potential. The governing equations for the pseudoparticle trajectories are solved using a computationally-intensive moving weighted least squares (MWLS) algorithm, and the trajectories can be computed in parallel. This work contributes a strategy for improving the performance of wavepacket simulations using the QTM on message-passing systems. Specifically, adaptivity is incorporated into the MWLS algorithm, and loop scheduling is employed to dynamically load balance the parallel computation of the trajectories. The adaptive MWLS algorithm reduces the amount of computations without sacrificing accuracy, while adaptive loop scheduling addresses the load imbalance introduced by the algorithm and the runtime system. Results of experiments on a Linux cluster are presented to confirm that the adaptive MWLS reduces the trajectory computation time by up to 24%, and adaptive loop scheduling achieves parallel effieciencies of up to 90% when simulating a free particle.

challenges of large applications in distributed environments | 2003

A load balancing tool for distributed parallel loops

Ricolindo L. Cariño; Ioana Banicescu

Large scale applications typically contain parallel loopswith many iterates. The iterates of a parallel loop may havevariable execution times which translate into performancedegradation of an application due to load imbalance. Thispaper describes a tool for load balancing parallel loopson distributed-memory systems. The tool assumes that thedata for a parallel loop to be executed is already partitionedamong the participating processors. The tool utilizesthe MPI library for interprocessor coordination, anddetermines processor workloads by loop scheduling techniques.The tool was designed independent of any application;hence, it must be supplied with a routine that encapsulatesthe computations for a chunk of loop iterates, as wellas the routines to transfer data and results between processors.Performance evaluation on a Linux cluster indicatesthat the tool reduces the cost of executing a simulated irregularloop without load balancing by up to 73%. The toolis useful for parallelizing sequential applications with parallelloops, or as an alternate load balancing routine forexisting parallel applications.

international parallel and distributed processing symposium | 2005

Overhead analysis of a dynamic load balancing library for cluster computing

Ioana Banicescu; Ricolindo L. Cariño; Jaderick P. Pabico; Mahadevan Balasubramaniam

This paper investigates the overhead of a dynamic load balancing library for large irregular data-parallel scientific applications on general-purpose clusters. The library is based on an integrated approach combining the advantages of novel dynamic loop scheduling strategies as data migration policies with the advances in resource management and task migration capabilities offered by a recently developed parallel runtime system. The paper focuses on the contribution of the runtime system software layer to the total overhead of the library. Experiments to compare the performance of two applications using the library, the N-body simulations and the profiling of a quadrature routine, with the performance of the same applications using an MPI-only implementation of the dynamic scheduling techniques indicate only a slight decrease in performance due to the overhead of the runtime system software layer. The results validate the suitability of the runtime system as an implementation platform for dynamic load balancing schemes, and underscore the significance of using the integrated approach, as well as the benefits of using the library especially in cluster applications characterized by irregular and unpredictable behavior.

international parallel and distributed processing symposium | 2003

Parallel adaptive quantum trajectory method for wavepacket simulations

Ricolindo L. Cariño; Ioana Banicescu; Ravi K. Vadapalli; Charles A. Weatherford; Jianping Zhu

Time-dependent wavepackets are widely used to model various phenomena in physics. One approach in simulating the wavepacket dynamics is the quantum trajectory method (QTM). Based on the hydrodynamic formulation of quantum mechanics, the QTM represents the wavepacket by an unstructured set of pseudoparticles whose trajectories are coupled by the quantum potential. The governing equations for the pseudoparticle trajectories are solved using a computationally intensive moving weighted least squares (MWLS) algorithm, and the trajectories can be computed in parallel. This paper contributes a strategy for improving the performance of wavepacket simulations using the QTM. Specifically, adaptivity is incorporated into the MWLS algorithm, and loop scheduling techniques are employed to dynamically load balance the parallel computation of the trajectories. The adaptive MWLS algorithm reduces the amount of computations without sacrificing accuracy, while adaptive loop scheduling addresses the load imbalance introduced by the algorithm and the runtime system. Results of experiments on a Linux cluster are presented to confirm that the adaptive MWLS reduces the trajectory computation time by up to 24%, and adaptive loop scheduling achieves parallel efficiencies of up to 85% when simulating a free particle.

Explore More