Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Keith Henderson is active.

Publication


Featured researches published by Keith Henderson.


conference on high performance computing (supercomputing) | 2005

A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

Andy Yoo; Edmond Chow; Keith Henderson; Will McLendon; Bruce Hendrickson

Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth- first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes at the Lawrence Livermore National Laboratory. Scalability was obtained through a series of optimizations, in particular, those that ensure scalable use of memory. We use 2D (edge) partitioning of the graph instead of conventional 1D (vertex) partitioning to reduce communication overhead. For Poisson random graphs, we show that the expected size of the messages is scalable for both 2D and 1D partitionings. Finally, we have developed efficient collective communication functions for the 3D torus architecture of BlueGene/L that also take advantage of the structure in the problem. The performance and characteristics of the algorithm are measured and reported.


acm symposium on applied computing | 2009

Applying latent dirichlet allocation to group discovery in large graphs

Keith Henderson; Tina Eliassi-Rad

This paper introduces LDA-G, a scalable Bayesian approach to finding latent group structures in large real-world graph data. Existing Bayesian approaches for group discovery (such as Infinite Relational Models) have only been applied to small graphs with a couple of hundred nodes. LDA-G (short for Latent Dirichlet Allocation for Graphs) utilizes a well-known topic modeling algorithm to find latent group structure. Specifically, we modify Latent Dirichlet Allocation (LDA) to operate on graph data instead of text corpora. Our modifications reflect the differences between real-world graph data and text corpora (e.g., a nodes neighbor count vs. a documents word count). In our empirical study, we apply LDA-G to several large graphs (with thousands of nodes) from PubMed (a scientific publication repository). We compare LDA-Gs quantitative performance on link prediction with two existing approaches: one Bayesian (namely, Infinite Relational Model) and one non-Bayesian (namely, Cross-association). On average, LDA-G outperforms IRM by 15% and Cross-association by 25% (in terms of area under the ROC curve). Furthermore, we demonstrate that LDA-G can discover useful qualitative information.


international world wide web conferences | 2012

Role-dynamics: fast mining of large dynamic networks

Ryan A. Rossi; Brian Gallagher; Jennifer Neville; Keith Henderson

To understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral patterns such as the center of a star, peripheral nodes, or bridge nodes that connect different communities. Our novel approach learns the appropriate structural role dynamics for any arbitrary network and tracks the changes over time. In particular, we uncover the specific global network dynamics and the local node dynamics of a technological, communication, and social network. We identify interesting node and network patterns such as stationary and non-stationary roles, spikes/steps in role-memberships (perhaps indicating anomalies), increasing/decreasing role trends, among many others. Our results indicate that the nodes in each of these networks have distinct connectivity patterns that are non-stationary and evolve considerably over time. Overall, the experiments demonstrate the effectiveness of our approach for fast mining and tracking of the dynamics in large networks. Furthermore, the dynamic structural representation provides a basis for building more sophisticated models and tools that are fast for exploring large dynamic networks.


international conference on cluster computing | 2006

MSSG: A Framework for Massive-Scale Semantic Graphs

Timothy D. R. Hartley; Füsun Özgüner; Andy Yoo; Scott R. Kohn; Keith Henderson

This paper presents a middleware framework for storing, accessing and analyzing massive-scale semantic graphs. The framework, MSSG, targets scale-free semantic graphs with O(1012) (trillion) vertices and edges. Here, we present the overall architectural design of the framework, as well as a prototype implementation for cluster architectures. The sheer size of these massive-scale semantic graphs prohibits storing the entire graph in memory even on medium- to large-scale parallel architectures. We therefore propose a new graph database, grDB, for the efficient storage and retrieval of large scale-free semantic graphs on secondary storage. This new database supports the efficient and scalable execution of parallel out-of-core graph algorithms which are essential for analyzing semantic graphs of massive size. We have also developed a parallel out-of-core breadth-first search algorithm for performance study. To the best of our knowledge, it is the first of such algorithms reported in the literature. Experimental evaluations on large real-world semantic graphs show that the MSSG framework scales well, and grDB outperforms widely used open-source out-of-core databases, such as BerkeleyDB and MySQL, in the storage and retrieval of scale-free graphs


ieee international conference on high performance computing data and analytics | 2008

BlueGene/L applications: Parallelism On a Massive Scale

Bronis R. de Supinski; Martin Schulz; Vasily V. Bulatov; William H. Cabot; Bor Chan; Andrew W. Cook; Erik W. Draeger; James N. Glosli; Jeffrey Greenough; Keith Henderson; Alison Kubota; Steve Louis; Brian Miller; Mehul Patel; Thomas E. Spelce; Frederick H. Streitz; Peter L. Williams; Robert Kim Yates; Andy Yoo; George S. Almasi; Gyan Bhanot; Alan Gara; John A. Gunnels; Manish Gupta; José E. Moreira; James C. Sexton; Bob Walkup; Charles J. Archer; Francois Gygi; Timothy C. Germann

BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the worlds largest system both in terms of scale, with 131,072 processors, and absolute performance, with a peak rate of 367 Tflop/s. BG/L has led the last four Top500 lists with a Linpack rate of 280.6 Tflop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine such as BG/L derives from the scientific breakthroughs that real applications can produce by successfully using its unprecedented scale and computational power. In this paper, we describe our experiences with eight large scale applications on BG/ L from several application domains, ranging from molecular dynamics to dislocation dynamics and turbulence simulations to searches in semantic graphs. We also discuss the challenges we faced when scaling these codes and present several successful optimization techniques. All applications show excellent scaling behavior, even at very large processor counts, with one code even achieving a sustained performance of more than 100 Tflop/s, clearly demonstrating the real success of the BG/L design.


Archive | 2005

Distributed Breadth-First Search with 2-D Partitioning

Edmond Chow; Keith Henderson; Andy Yoo

Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a scalable implementation of distributed breadth-first search (BFS) which has been applied to graphs with over one billion vertices. The main contribution of this paper is to compare a 2-D (edge) partitioning of the graph to the more common 1-D (vertex) partitioning. For Poisson random graphs which have low diameter like many realistic information network data, we determine when one type of partitioning is advantageous over the other. Also for Poisson random graphs, we show that memory use is scalable. The experimental tests use a level-synchronized BFS algorithm running on a large Linux cluster and BlueGene/L. On the latter machine, the timing is related to the number of synchronization steps in the algorithm.


acm symposium on applied computing | 2015

EP-MEANS: an efficient nonparametric clustering of empirical probability distributions

Keith Henderson; Brian Gallagher; Tina Eliassi-Rad

Given a collection of m continuous-valued, one-dimensional empirical probability distributions {P1, ..., Pm}, how can we cluster these distributions efficiently with a nonparametric approach? Such problems arise in many real-world settings where keeping the moments of the distribution is not appropriate, because either some of the moments are not defined or the distributions are heavy-tailed or bi-modal. Examples include mining distributions of inter-arrival times and phone-call lengths. We present an efficient algorithm with a non-parametric model for clustering empirical, one-dimensional, continuous probability distributions. Our algorithm, called ep-means, is based on the Earth Movers Distance and k-means clustering. We illustrate the utility of ep-means on various data sets and applications. In particular, we demonstrate that ep-means effectively and efficiently clusters probability distributions of mixed and arbitrary shapes, recovering ground-truth clusters exactly in cases where existing methods perform at baseline accuracy. We also demonstrate that ep-means outperforms moment-based classification techniques and discovers useful patterns in a variety of real-world applications.


social computing behavioral modeling and prediction | 2010

Literature search through mixed-membership community discovery

Tina Eliassi-Rad; Keith Henderson

We introduce a new approach to literature search that is based on finding mixed-membership communities on an augmented co-authorship graph (ACA) with a scalable generative model. An ACA graph contains two types of edges: (1) coauthorship links and (2) links between researchers with substantial expertise overlap. Our solution eliminates the biases introduced by either looking at citations of a paper or doing a Web search. A case study on PubMed shows the benefits of our approach.


conference on high performance computing (supercomputing) | 2006

Parallel massive scale-free graph generators

Andy Yoo; Keith Henderson

The lack of publicly available large scale-free graphs forces researchers studying massive scale-free graphs to rely on synthetically generated graphs in testing and evaluating their algorithms. This requires a graph generator that can scale to the graphs with potentially tens and hundreds of billions of vertices and edges. We have developed two such scalable parallel graph generators in this research. The parallel Barabasi-Albert method iteratively builds scale-free graphs using two-phase preferential attachment technique in a bottom-up fashion. The parallel Kronecker method, on the other hand, constructs a graph recursively in a top-down fashion from a given seed graph using the Kronecker matrix multiplication. We show that both graph generators generate massive graphs at a very high rate. It is also shown that graphs generated by these methods have all the common properties of the real scale-free graphs such as power-law degree distribution and small-worldness.


international conference on social computing | 2011

Ranking information in networks

Tina Eliassi-Rad; Keith Henderson

Given a network, we are interested in ranking sets of nodes that score highest on user-specified criteria. For instance in graphs from bibliographic data (e.g. PubMed), we would like to discover sets of authors with expertise in a wide range of disciplines. We present this ranking task as a Top-K problem; utilize fixed-memory heuristic search; and present performance of both the serial and distributed search algorithms on synthetic and real-world data sets.

Collaboration


Dive into the Keith Henderson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian Gallagher

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Andy Yoo

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hanghang Tong

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Lei Li

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Edmond Chow

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge