Is this you? Create Your Porfile

David J. Haglin

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David J. Haglin is active.

Explore More

Publication

Featured researches published by David J. Haglin.

extended semantic web conference | 2011

High-performance computing applied to semantic databases

Eric Goodman; Edward Steven Jimenez; David Mizell; Sinan Al-Saffar; Bob Adolf; David J. Haglin

To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global sharedmemory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely in-memory.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Hashing strategies for the Cray XMT

Eric Goodman; David J. Haglin; Chad Scherrer; Daniel G. Chavarría-Miranda; Jace A. Mogill; John Feo

Two of the most commonly used hashing strategies-linear probing and hashing with chaining-are adapted for efficient execution on a Cray XMT. These strategies are designed to minimize memory contention. Datasets that follow a power law distribution cause significant performance challenges to shared memory parallel hashing implementations. Experimental results show good scalability up to 128 processors on two power law datasets with different data types: integer and string. These implementations can be used in a wide range of applications.

IEEE Micro | 2014

Scaling Semantic Graph Databases in Size and Performance

Alessandro Morari; Vito Giovanni Castellana; Oreste Villa; Antonino Tumeo; Jesse Weaver; David J. Haglin; Sutanay Choudhury; John Feo

GEMS is a full software system that implements a large-scale, semantic graph database on commodity clusters. Its framework comprises a SPARQL-to-C++ compiler, a library of distributed data structures, and a custom multithreaded runtime library. The authors evaluated their software stack on the Berlin SPARQL benchmark with datasets of up to 10 billion graph edges, demonstrating scaling in dataset size and performance as they added cluster nodes.

IEEE Computer | 2015

In-Memory Graph Databases for Web-Scale Data

Vito Giovanni Castellana; Alessandro Morari; Jesse Weaver; Antonino Tumeo; David J. Haglin; Oreste Villa; John Feo

A software stack relies primarily on graph-based methods to implement scalable resource description framework databases on top of commodity clusters, providing an inexpensive way to extract meaning from volumes of heterogeneous data.

ieee symposium on large data analysis and visualization | 2015

A visual analytics paradigm enabling trillion-edge graph exploration

Pak Chung Wong; David J. Haglin; David S. Gillen; Daniel Chavarria; Vito Giovanni Castellana; Cliff Joslyn; Alan R. Chappell; Song Zhang

We present a visual analytics paradigm and a system prototype for exploring Web-scale graphs. A web-scale graph is described as a graph with ~one trillion edges and ~50 billion vertices. While there is an aggressive R&D effort in processing and exploring Web-Scale graphs among Internet vendors such as Facebook and Google, visualizing a graph of that scale still remains an underexplored R&D area. The paper describes a nontraditional peek-and-filter strategy that facilitates the exploration of a graph database of unprecedented size for visualization and analytics. We demonstrate that our system prototype can (1) preprocess a graph with ~25 billion edges in less than two hours and (2) support database query and interactive visualization on the processed graph database afterward. Based on our computational performance results, we argue that we most likely will achieve the one trillion edge mark (a computational performance improvement of 40 times) for graph visual analytics in the near future.

international conference on big data | 2013

Accelerating semantic graph databases on commodity clusters

Alessandro Morari; Vito Giovanni Castellana; David J. Haglin; John Feo; Jesse Weaver; Antonino Tumeo; Oreste Villa

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.

ieee international conference on high performance computing data and analytics | 2012

Towards Efficient N-x Contingency Selection Using Group betweenness Centrality

Mahantesh Halappanavar; Yousu Chen; Robert D. Adolf; David J. Haglin; Zhenyu Huang; Mark J. Rice

The goal of N - x contingency selection is to pick a subset of critical cases to assess their potential to initiate a severe crippling of an electric power grid. Even for a moderate-sized system there can be an overwhelmingly large number of contingency cases that need to be studied. The number grows exponentially with x. This combinatorial explosion renders any exhaustive search strategy computationally infeasible, even for small to medium sized systems. We propose a novel method for N - x contingency selection for x ≥ 2 using group betweenness centrality and show that computation can be relatively decoupled from the problem size. Thus, making contingency analysis feasible for large systems with x ≥ 2. Consequently, it may be that N - x (for x ≥ 2) contingency selection can be effectively deployed despite the combinatorial explosion of the number of potential N - x contingencies.

international parallel and distributed processing symposium | 2016

GraQL: A Query Language for High-Performance Attributed Graph Databases

Daniel G. Chavarría-Miranda; Vito Giovanni Castellana; Alessandro Morari; David J. Haglin; John Feo

Graph databases are becoming a critical tool for the analysis of graph-structured data in the context of multiple scientific and technical domains, including cybersecurity and computational biology. In particular, the storage, analysis and querying of attributed graphs is a very important capability. Attributed graphs contain properties attached to the vertices and edges of the graph structure. Queries over attributed graphs do not only include structural pattern matching, but also conditions over the values of the attributes. In this work, we present GraQL, a query language designed for high-performance attributed graph databases hosted on a high memory capacity cluster. GraQL is designed to be the front-end language for the attributed graph data model for the GEMS database system.

2016 IEEE Symposium on Technologies for Homeland Security (HST) | 2016

A novel centrality measure for network-wide cyber vulnerability assessment

Arun V. Sathanur; David J. Haglin

In this work we propose a novel formulation that models the attack and compromise on a cyber network as a combination of two parts - direct compromise of a host and the compromise occurring through the spread of the attack on the network from a compromised host. The model parameters for the nodes are a concise representation of the host profiles that can include the risky behaviors of the associated human users while the model parameters for the edges are based on the existence of vulnerabilities between each pair of connected hosts. The edge models relate to the summary representations of the corresponding attack-graphs. This results in a formulation based on Random Walk with Restart (RWR) and the resulting centrality metric can be solved for in an efficient manner through the use of sparse linear solvers. Thus the formulation goes beyond mere topological considerations in centrality computations by summarizing the host profiles and the attack graphs into the model parameters. The computational efficiency of the method also allows us to also quantify the uncertainty in the centrality measure through Monte Carlo analysis.

machine learning and data mining in pattern recognition | 2011

Techniques for improving filters in power grid contingency analysis

Robert D. Adolf; David J. Haglin; Mahantesh Halappanavar; Yousu Chen; Zhenyu Huang

Electrical power grid contingency analysis aims to understand the impact of potential component failures and assess a systems capability to tolerate them. The computational resources needed to explore all potential x-component failures, for modest sizes of x > 1, is not feasible due to the combinatorial explosion of cases to consider. A common approach for addressing the large workload is to select the most severe x-component failures to explore (a process we call filtering). It is important to assess the efficacy of a filter; in particular, it is necessary to understand the likelihood that a potentially severe case is filtered out. A framework for assessing the quality/performance of a filter is proposed. This framework is generalized to support resource-aware filters and multiple evaluation criteria.

Explore More