Kamesh Madduri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kamesh Madduri is active.

Explore More

Publication

Featured researches published by Kamesh Madduri.

workshop on algorithms and models for the web graph | 2007

Approximating betweenness centrality

David A. Bader; Shiva Kintali; Kamesh Madduri; Milena Mihail

Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationally-expensive to exactly determine betweenness; currently the fastest-known algorithm by Brandes requires O(nm) time for unweighted graphs and O(nm + n2 log n) time for weighted graphs, where n is the number of vertices and m is the number of edges in the network. These are also the worstcase time bounds for computing the betweenness score of a single vertex. In this paper, we present a novel approximation algorithm for computing betweenness centrality of a given vertex, for both weighted and unweighted graphs. Our approximation algorithm is based on an adaptive sampling technique that significantly reduces the number of single-source shortest path computations for vertices with high centrality. We conduct an extensive experimental study on real-world graph instances, and observe that our random sampling algorithm gives very good betweenness approximations for biological networks, road networks and web crawls.

international conference on parallel processing | 2006

Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

David A. Bader; Kamesh Madduri

Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence for applications involving large-scale networks. In this paper, we present fast parallel implementations of three fundamental graph theory problems, breadth-first search, st-connectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA-2. The architectural features of the MTA-2 aid the design of simple, scalable and high-performance graph algorithms. We test our implementations on large scale-free and sparse random graph instances, and report impressive results, both for algorithm execution time and parallel performance. For instance, breadth-first search on a scale-free graph of 400 million vertices and 2 billion edges takes less than 5 seconds on a 40-processor MTA-2 system with an absolute speedup of close to 30. This is a significant result in parallel computing, as prior implementations of parallel graph algorithms report very limited or no speedup on irregular and sparse graphs, when compared to the best sequential implementation

international conference on parallel processing | 2006

Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks

David A. Bader; Kamesh Madduri

This paper discusses fast parallel algorithms for evaluating several centrality indices frequently used in complex network analysis. These algorithms have been optimized to exploit properties typically observed in real-world large scale networks, such as the low average distance, high local density, and heavy-tailed power law degree distributions. We test our implementations on real datasets such as the Web graph, protein-interaction networks, movie-actor and citation networks, and report impressive parallel performance for evaluation of the computationally intensive centrality metrics (betweenness and closeness centrality) on high-end shared memory symmetric multiprocessor and multithreaded architectures. To our knowledge, these are the first parallel implementations of these widely-used social network analysis metrics. We demonstrate that it is possible to rigorously analyze networks three orders of magnitude larger than instances that can be handled by existing network analysis (SNA) software packages. For instance, we compute the exact betweenness centrality value for each vertex in a large US patent citation network (3 million patents, 16 million citations) in 42 minutes on 16 processors, utilizing 20GB RAM of the IBM p5 570. SNA packages on the other hand cannot handle graphs with more than hundred thousand edges

ieee international conference on high performance computing data and analytics | 2011

Parallel breadth-first search on distributed memory systems

Aydin Buluç; Kamesh Madduri

Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned parallel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix partitioning-based approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD MagnyCours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.

international parallel and distributed processing symposium | 2009

A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets

Kamesh Madduri; David Ediger; Karl Jiang; David A. Bader; Daniel G. Chavarría-Miranda

We present a new lock-free parallel algorithm for computing betweenness centrality of massive complex networks that achieves better spatial locality compared with previous approaches. Betweenness centrality is a key kernel in analyzing the importance of vertices (or edges) in applications ranging from social networks, to power grids, to the influence of jazz musicians, and is also incorporated into the DARPA HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph analytics. We design an optimized implementation of betweenness centrality for the massively multithreaded Cray XMT system with the Thread-storm processor. For a small-world network of 268 million vertices and 2.147 billion edges, the 16-processor XMT system achieves a TEPS rate (an algorithmic performance count for the number of edges traversed per second) of 160 million per second, which corresponds to more than a 2× performance improvement over the previous parallel implementation. We demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for the large IMDb movie-actor network.

international parallel and distributed processing symposium | 2008

SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks

David A. Bader; Kamesh Madduri

We present SNAP (small-world network analysis and partitioning), an open-source graph framework for exploratory study and partitioning of large-scale networks. To illustrate the capability of SNAP, we discuss the design, implementation, and performance of three novel parallel community detection algorithms that optimize modularity, a popular measure for clustering quality in social network analysis. In order to achieve scalable parallel performance, we exploit typical network characteristics of small-world networks, such as the low graph diameter, sparse connectivity, and skewed degree distribution. We conduct an extensive experimental study on real-world graph instances and demonstrate that our parallel schemes, coupled with aggressive algorithm engineering for small-world networks, give significant running time improvements over existing modularity-based clustering heuristics, with little or no loss in clustering quality. For instance, our divisive clustering approach based on approximate edge betweenness centrality is more than two orders of magnitude faster than a competing greedy approach, for a variety of large graph instances on the Sun Fire T2000 multicore system. SNAP also contains parallel implementations of fundamental graph-theoretic kernels and topological analysis metrics (e.g., breadth-first search, connected components, vertex and edge centrality) that are optimized for small- world networks. The SNAP framework is extensible; the graph kernels are modular, portable across shared memory multicore and symmetric multiprocessor systems, and simplify the design of high-level domain-specific applications.

ieee international conference on high performance computing data and analytics | 2005

Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors

David A. Bader; Kamesh Madduri

Graph theoretic problems are representative of fundamental computations in traditional and emerging scientific disciplines like scientific computing and computational biology, as well as applications in national security. We present our design and implementation of a graph theory application that supports the kernels from the Scalable Synthetic Compact Applications (SSCA) benchmark suite, developed under the DARPA High Productivity Computing Systems (HPCS) program. This synthetic benchmark consists of four kernels that require irregular access to a large, directed, weighted multi-graph. We have developed a parallel implementation of this benchmark in C using the POSIX thread library for commodity symmetric multiprocessors (SMPs). In this paper, we primarily discuss the data layout choices and algorithmic design issues for each kernel, and also present execution time and benchmark validation results.

international parallel and distributed processing symposium | 2007

On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking

David A. Bader; Virat Agarwal; Kamesh Madduri

The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD coprocessing units (SPEs) integrated on-chip. We present a complexity model for designing algorithms on the Cell processor, along with a systematic procedure for algorithm analysis. To estimate the execution time of the algorithm, we consider the computational complexity, memory access patterns (DMA transfer sizes and latency), and the complexity of branching instructions. This model, coupled with the analysis procedure, simplifies algorithm design on the Cell and enables quick identification of potential implementation bottlenecks. Using the model, we design an efficient implementation of list ranking, a representative problem from the class of combinatorial and graph-theoretic applications. Due to its highly irregular memory patterns, list ranking is a particularly challenging problem to parallelize on current cache-based and distributed memory architectures. We describe a generic work-partitioning technique on the Cell to hide memory access latency, and apply this to efficiently implement list ranking. We run our algorithm on a 3.2 GHz Cell processor using an IBM QS20 Cell Blade and demonstrate a substantial speedup for list ranking on the Cell in comparison to traditional cache-based microprocessors. For a random linked list of 1 million nodes, we achieve an an overall speedup of 8.34 over a PPE-only implementation.

Journal of Discrete Algorithms | 2010

A faster algorithm for the single source shortest path problem with few distinct positive lengths

James B. Orlin; Kamesh Madduri; K. Subramani; Matthew D. Williamson

In this paper, we propose an efficient method for implementing Dijkstras algorithm for the Single Source Shortest Path Problem (SSSPP) in a graph whose edges have positive length, and where there are few distinct edge lengths. The SSSPP is one of the most widely studied problems in theoretical computer science and operations research. On a graph with n vertices, m edges and K distinct edge lengths, our algorithm runs in O(m) time if nK=<2m, and O(mlognKm) time, otherwise. We tested our algorithm against some of the fastest algorithms for SSSPP on graphs with arbitrary but positive lengths. Our experiments on graphs with few edge lengths confirmed our theoretical results, as the proposed algorithm consistently dominated the other SSSPP algorithms, which did not exploit the special structure of having few distinct edge lengths.

international parallel and distributed processing symposium | 2007

SWARM: A Parallel Programming Framework for Multicore Processors

David A. Bader; Varun Kanade; Kamesh Madduri

Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requires the exploitation of concurrency at the algorithmic level. In this paper, we identify key issues in algorithm design for multicore processors and propose a computational model for these systems. We introduce SWARM (software and algorithms for running on multi-core), a portable open-source parallel library of basic primitives that fully exploit multicore processors. Using this framework, we have implemented efficient parallel algorithms for important primitive operations such as prefix-sums, pointer-jumping, symmetry breaking, and list ranking; for combinatorial problems such as sorting and selection; for parallel graph theoretic algorithms such as spanning tree, minimum spanning tree, graph decomposition, and tree contraction; and for computational genomics applications such as maximum parsimony. The main contributions of this paper are the design of the SWARM multicore framework, the presentation of a multicore algorithmic model, and validation results for this model. SWARM is freely available as open-source from http://multicore-swarm.sourceforge.net/.

Explore More