Daniel G. Chavarría-Miranda

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel G. Chavarría-Miranda is active.

Explore More

Publication

Featured researches published by Daniel G. Chavarría-Miranda.

international parallel and distributed processing symposium | 2009

A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets

Kamesh Madduri; David Ediger; Karl Jiang; David A. Bader; Daniel G. Chavarría-Miranda

We present a new lock-free parallel algorithm for computing betweenness centrality of massive complex networks that achieves better spatial locality compared with previous approaches. Betweenness centrality is a key kernel in analyzing the importance of vertices (or edges) in applications ranging from social networks, to power grids, to the influence of jazz musicians, and is also incorporated into the DARPA HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph analytics. We design an optimized implementation of betweenness centrality for the massively multithreaded Cray XMT system with the Thread-storm processor. For a small-world network of 268 million vertices and 2.147 billion edges, the 16-processor XMT system achieves a TEPS rate (an algorithmic performance count for the number of edges traversed per second) of 160 million per second, which corresponds to more than a 2× performance improvement over the previous parallel implementation. We demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for the large IMDb movie-actor network.

acm sigplan symposium on principles and practice of parallel programming | 2005

An evaluation of global address space languages: co-array fortran and unified parallel C

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; François Cantonnet; Tarek A. El-Ghazawi; Ashrujit Mohanti; Yiyi Yao; Daniel G. Chavarría-Miranda

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.

international conference on e-science | 2009

A High-Performance Hybrid Computing Approach to Massive Contingency Analysis in the Power Grid

Ian Gorton; Zhenyu Huang; Yousu Chen; Benson K. Kalahar; Shuangshuang Jin; Daniel G. Chavarría-Miranda; Douglas J. Baxter; John Feo

Operating the electrical power grid to prevent power black-outs is a complex task. An important aspect of this is contingency analysis, which involves understanding and mitigating potential failures in power grid elements such as transmission lines. When taking into account the potential for multiple simultaneous failures (known as the N-x contingency problem), contingency analysis becomes a massively computational task. In this paper we describe a novel hybrid computational approach to contingency analysis. This approach exploits the unique graph processing performance of the Cray XMT in conjunction with a conventional massively parallel compute cluster to identify likely simultaneous failures that could cause widespread cascading power failures that have massive economic and social impact on society. The approach has the potential to provide the first practical and scalable solution to the N-x contingency problem. When deployed in power grid operations, it will increase the grid operator’s ability to deal effectively with outages and failures with power grid components while preserving stable and safe operation of the grid. The paper describes the architecture of our solution and presents preliminary performance results that validate the efficacy of our approach.

power and energy society general meeting | 2010

Performance evaluation of counter-based dynamic load balancing schemes for massive contingency analysis with different computing environments

Yousu Chen; Zhenyu Huang; Daniel G. Chavarría-Miranda

Contingency analysis is a key function in the Energy Management System (EMS) to assess the impact of various combinations of power system component failures based on state estimation. Contingency analysis is also extensively used in power market operation for feasibility test of market solutions. High performance computing holds the promise of faster analysis of more contingency cases for the purpose of safe and reliable operation of todays power grids with less operating margin and more intermittent renewable energy sources. This paper evaluates the performance of counter-based dynamic load balancing schemes for massive contingency analysis under different computing environments. Insights from the performance evaluation can be used as guidance for users to select suitable schemes in the application of massive contingency analysis. Case studies, as well as MATLAB simulations, of massive contingency cases using the Western Electricity Coordinating Council power grid model are presented to illustrate the application of high performance computing with counter-based dynamic load balancing schemes.

computing frontiers | 2007

Evaluating the potential of multithreaded platforms for irregular scientific computations

Jarek Nieplocha; Andres Marquez; John Feo; Daniel G. Chavarría-Miranda; George Chin; Chad Scherrer; Nathaniel Beagley

The resurgence of current and upcoming multithreaded architectures and programming models led us to conduct a detailed study to understand the potential of these platforms to increase the performance of data-intensive, irregular scientific applications. Our study is based on a power system state estimation application and a novel anomaly detection application applied to network traffic data. We also conducted a detailed evaluation of the platforms using microbenchmarks in order to gain insight into their architectural capabilities and their interaction with programming models and application software. The evaluation was performed on the Cray MTA-2 and the Sun Niagar.

international parallel and distributed processing symposium | 2003

Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

Alain Darte; John M. Mellor-Crummey; Robert J. Fowler; Daniel G. Chavarría-Miranda

Multipartitioning is a strategy for decomposing multi-dimensional arrays into tiles and mapping the resulting tiles onto a collection of processors. This class of partitionings enables efficient parallelization of line-sweep computations that solve one-dimensional recurrences along each dimension of a multi-dimensional array. Multipartitionings yield balanced parallelism for line sweeps by assigning each processor the same number of data tiles to compute at each step of a sweep along any array dimension. Also, they induce only coarse-grain communication.This paper considers the problem of computing generalized multipartitionings, which decompose d-dimensional arrays, d ≥ 2, onto an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. We use a cost model to select the dimensionality of the best partitioning and the number of cuts to make along each array dimension; then, we show how to construct a mapping that assigns the resulting data tiles to each of the processors. The assignment of tiles to processors induced by this class of multipartitionings corresponds to an instance of a latin hyper-rectangle, a natural extension of latin squares, which have been widely studied in mathematics and statistics.Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that employs our strategy for generalized multipartitioning and show that the compilers generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance.

acm sigplan symposium on principles and practice of parallel programming | 2005

Effective communication coalescing for data-parallel applications

Daniel G. Chavarría-Miranda; John M. Mellor-Crummey

Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High Performance Fortran (HPF). To handle sophisticated computation partitionings, our algorithm normalizes communication before attempting coalescing. We experimentally evaluate our algorithm, which is implemented in the dHPF compiler, in the compilation of HPF versions of the NAS application benchmarks SP, BT and LU. Our normalized coalescing algorithm improves the performance and scalability of compiler-generated code for these benchmarks by reducing the communication volume up to 55% compared to a simpler coalescing strategy and enables us to match the communication volume and frequency in hand-optimized MPI implementations of these codes.

2006 IEEE Power Engineering Society General Meeting | 2006

Towards efficient power system state estimators on shared memory computers

Jaroslaw Nieplocha; Andres Marquez; Vinod Tipparaju; Daniel G. Chavarría-Miranda; Ross T. Guttromson; H. Huang

We are investigating the effectiveness of parallel weighted- least-square (WLS) state estimation solvers on shared-memory parallel computers. Shared-memory parallel architectures are rapidly becoming ubiquitous due to the advent of multi-core processors. In the current evaluation, we are using an LU-based solver as well as a conjugate gradient (CG)-based solver for a 1177-bus system. In lieu of a very wide multi-core system we evaluate the effectiveness of the solvers on an SGI Altix system on up to 32 processors. On this platform, as expected, the shared memory implementation (pthreads) of the LU solver was found to be more efficient than the MPI version. Our implementation of the CG solver scales and performs significantly better than the state-of-the-art implementation of the LU solver: with CG we can solve the problem 4.75 times faster than using LU. These findings indicate that CG algorithms should be quite effective on multicore processors

Concurrency and Computation: Practice and Experience | 2002

Advanced Optimization Strategies in the Rice dHPF Compiler

John M. Mellor-Crummey; Vikram S. Adve; Bradley Broom; Daniel G. Chavarría-Miranda; Robert J. Fowler; Guohua Jin; Ken Kennedy; Qing Yi

High‐Performance Fortran (HPF) was envisioned as a vehicle for modernizing legacy Fortran codes to achieve scalable parallel performance. To a large extent, todays commercially available HPF compilers have failed to deliver scalable parallel performance for a broad spectrum of applications because of insufficiently powerful compiler analysis and optimization. Substantial restructuring and hand‐optimization can be required to achieve acceptable performance with an HPF port of an existing Fortran application, even for regular data‐parallel applications. A key goal of the Rice dHPF compiler project has been to develop optimization techniques that enable a wide range of existing scientific applications to be ported easily to efficient HPF with minimal restructuring. This paper describes the challenges to effective parallelization presented by complex (but regular) data‐parallel applications, and then describes how the novel analysis and optimization technologies in the dHPF compiler address these challenges effectively, without major rewriting of the applications. We illustrate the techniques by describing their use for parallelizing the NAS SP and BT benchmarks. The dHPF compiler generates multipartitioned parallelizations of these codes that are approaching the scalability and efficiency of sophisticated hand‐coded parallelizations. Copyright

IEEE Transactions on Parallel and Distributed Systems | 2012

Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures

Antonino Tumeo; Oreste Villa; Daniel G. Chavarría-Miranda

String matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. This paper compares several software-based implementations of the Aho-Corasick algorithm for high-performance systems. We focus on the matching of unknown inputs streamed from a single source, typical of security applications and difficult to manage since the input cannot be preprocessed to obtain locality. We consider shared-memory architectures (Niagara 2, x86 multiprocessors, and Cray XMT) and distributed-memory architectures with homogeneous (InfiniBand cluster of x86 multicores) or heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C1060 GPUs). We describe how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.

Explore More