Andrew Lenharth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew Lenharth is active.

Explore More

Publication

Featured researches published by Andrew Lenharth.

symposium on operating systems principles | 2013

A lightweight infrastructure for graph analytics

Donald Nguyen; Andrew Lenharth; Keshav Pingali

Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system. We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that end-to-end performance can be improved by orders of magnitude even on power-law graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for power-law graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.

european conference on parallel processing | 2015

Priority Queues Are Not Good Concurrent Priority Schedulers

Andrew Lenharth; Donald Nguyen; Keshav Pingali

The need for priority scheduling arises in many algorithms. In these algorithms, there is a dynamic pool of lightweight, unordered tasks, and some execution orders are more efficient than others. Therefore, each task is given an application-specific priority that is a heuristic measure of its importance for early scheduling, and the runtime system schedules these tasks roughly in this order. Concurrent priority queues are not suitable for this purpose. We show that by exploiting the fact that algorithms amenable to priority scheduling are often robust to small deviations from a strict priority order, and by optimizing the scheduler for the cache hierarchy of current multicore and NUMA processors, we can implement concurrent priority schedulers that improve the end-to-end performance of complex irregular benchmarks by orders of magnitude compared to using state-of-the-art concurrent priority queues.

architectural support for programming languages and operating systems | 2014

Deterministic galois: on-demand, portable and parameterless

Donald Nguyen; Andrew Lenharth; Keshav Pingali

Non-determinism in program execution can make program development and debugging difficult. In this paper, we argue that solutions to this problem should be on-demand, portable and parameterless. On-demand means that the programming model should permit the writing of non-deterministic programs since these programs often perform better than deterministic ones for the same problem. Portable means that the program should produce the same answer even if it is run on different machines. Parameterless means that if there are machine-dependent scheduling parameters that must be tuned for good performance, they must not affect the output. Although many solutions for deterministic program execution have been proposed in the literature, they fall short along one or more of these dimensions. To remedy this, we propose a new approach, based on the Galois programming model, in which (i) the programming model permits the writing of non-deterministic programs and (ii) the runtime system executes these programs deterministically if needed. Evaluation of this approach on a collection of benchmarks from the PARSEC, PBBS, and Lonestar suites shows that it delivers deterministic execution with substantially less overhead than other systems in the literature.

international parallel and distributed processing symposium | 2004

Quality-based adaptive resource management architecture (QARMA): a CORBA resource management service

David Fleeman; Matthew Gillen; Andrew Lenharth; M. Delaney; Lonnie R. Welch; David W. Juedes; Chang Liu

Summary form only given. We describe the quality-based adaptive resource management architecture, QARMA, a framework for resource management within CORBA. QARMA consists of three major components: the system repository service, the resource management service, and the enactor service. QARMA serves as a basis for integration of existing CORBA services and management mechanisms into a single, coherent framework for resource management. QARMA supports the management of a wide variety of applications developed using various development paradigms, easily integrates with other management and infrastructure components that already exist as CORBA services, and is easily extended to allow the use of new resource management mechanisms as they become available.

Communications of The ACM | 2016

Parallel graph analytics

Andrew Lenharth; Donald Nguyen; Keshav Pingali

Data-centric abstractions and execution strategies are needed to exploit parallelism in large-scale graph analytics.

ieee international conference on high performance computing data and analytics | 2014

Parallelization of reordering algorithms for bandwidth and wavefront reduction

Konstantinos I. Karantasis; Andrew Lenharth; Donald Nguyen; María Jesús Garzarán; Keshav Pingali

Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wave front can improve the locality of reference of sparse matrix-vector multiplication (SpMV), the key kernel in iterative solvers. In this paper, we present the first parallel implementations of two widely used reordering algorithms: Reverse Cut hill-McKee (RCM) and Sloan. On 16 cores of the Stampede supercomputer, our parallel RCM is 5.56 times faster on the average than a state-of-the-art sequential implementation of RCM in the HSL library. Sloan is significantly more constrained than RCM, but our parallel implementation achieves a speedup of 2.88X on the average over sequential HSL-Sloan. Reordering the matrix using our parallel RCM and then performing 100 SpMV iterations is twice as fast as using HSL-RCM and then performing the SpMV iterations, it is also 1.5 times faster than performing the SpMV iterations without reordering the matrix.

international conference on conceptual structures | 2014

Graph grammar based multi-thread multi-frontal direct solver with Galois scheduler

Damian Goik; Konrad Jopek; Maciej Paszyński; Andrew Lenharth; Donald Nguyen; Keshav Pingali

In this paper, we present a multi-frontal solver algorithm for the adaptive finite element method expressed by graph grammar productions. The graph grammar productions construct first the binary elimination tree, and then process frontal matrices stored in distributed manner in nodes of the elimination tree. The solver is specialized for a class of one, two and three dimensional h refined meshes whose elimination tree has a regular structure. In particular, this class contains all one dimensional grids, two and three dimensional grids refined towards point singularities, two dimensional grids refined in an anisotropic way towards edge singularity as well as three dimensional grids refined in an anisotropic way towards edge or face singularities. In all these cases, the structure of the elimination tree and the structure of the frontal matrices are similar. The solver is implemented within the Galois environment, which allows parallel execution of graph grammar productions. We also compare the performance of the Galois implementation of our graph grammar based solver with the MUMPS solver.

Scientific Programming | 2015

Quasi-Optimal elimination trees for 2D grids with singularities

Anna Paszyńska; Maciej Paszyński; Konrad Jopek; M. Woźniak; Damian Goik; Piotr Gurgul; Hassan AbouEisha; Mikhail Moshkov; Victor M. Calo; Andrew Lenharth; Donald Nguyen; Keshav Pingali

We construct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O(Ne log(Ne)), where Ne is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.

international conference on supercomputing | 2016

DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem

Saeed Maleki; Donald Nguyen; Andrew Lenharth; María Jesús Garzarán; David A. Padua; Keshav Pingali

The Single Source Shortest Path (SSSP) problem consists in finding the shortest paths from a vertex (the source vertex) to all other vertices in a graph. SSSP has numerous applications. For some algorithms and applications, it is useful to solve the SSSP problem in parallel. This is the case of Betweenness Centrality which solves the SSSP problem for multiple source vertices in large graphs. In this paper, we introduce the Dijkstra Strip Mined Relaxation (DSMR) algorithm, an efficient parallel SSSP algorithm for shared and distributed-memory systems. We also introduce a set of preprocessing optimization techniques that significantly reduce the communication overhead without increasing the total amount of work dramatically. Our results show that, DSMR is faster than the best previous algorithm, parallel Δ-Stepping, by up-to 7.38×.

european conference on parallel processing | 2015

Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned

Joyce Jiyoung Whang; Andrew Lenharth; Inderjit S. Dhillon; Keshav Pingali

Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve more than 28x the performance of standard PageRank implementations (e.g., those in GraphLab). The design choices affect both single-threaded performance as well as parallel scalability. The implementation lessons not only guide efficient implementations of many graph mining algorithms, but also provide a framework for designing new scalable algorithms.

Explore More