Dylan Hutchison
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dylan Hutchison.
international parallel and distributed processing symposium | 2015
Vijay Gadepally; Jake Bolewski; Dan Hook; Dylan Hutchison; Benjamin A. Miller; Jeremy Kepner
Big data and the Internet of Things era continue to challenge computational systems. Several technology solutions such as NoSQL databases have been developed to deal with this challenge. In order to generate meaningful results from large datasets, analysts often use a graph representation which provides an intuitive way to work with the data. Graph vertices can represent users and events, and edges can represent the relationship between vertices. Graph algorithms are used to extract meaningful information from these very large graphs. At MIT, the Graphulo initiative is an effort to perform graph algorithms directly in NoSQL databases such as Apache Accumulo or SciDB, which have an inherently sparse data storage scheme. Sparse matrix operations have a history of efficient implementations and the Graph Basic Linear Algebra Subprogram (Graph BLAS) community has developed a set of key kernels that can be used to develop efficient linear algebra operations. However, in order to use the Graph BLAS kernels, it is important that common graph algorithms be recast using the linear algebra building blocks. In this article, we look at common classes of graph algorithms and recast them into linear algebra operations using the Graph BLAS building blocks.
ieee high performance extreme computing conference | 2015
Dylan Hutchison; Jeremy Kepner; Vijay Gadepally; Adam Fuchs
The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we discuss a server-side implementation of GraphBLAS sparse matrix multiplication that leverages Accumulos native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulos peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.
ieee high performance extreme computing conference | 2016
Jeremy Kepner; Peter Aaltonen; David A. Bader; Aydin Buluç; Franz Franchetti; John R. Gilbert; Dylan Hutchison; Manoj Kumar; Andrew Lumsdaine; Henning Meyerhenke; Scott McMillan; Carl Yang; John D. Owens; Marcin Zalewski; Timothy G. Mattson; José E. Moreira
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.
international conference on management of data | 2017
Dylan Hutchison; Bill Howe; Dan Suciu
Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming model that affords unified reasoning and algorithm design. At the logical level we propose LARA, a lean algebra of three operators, that expresses RA and LA as well as relevant optimization rules. We show a series of proofs that position LARA at just the right level of expressiveness for a middleware algebra: more explicit than MapReduce but more general than RA or LA. At the physical level we find that the LARA operators afford efficient implementations using a single primitive that is available in a variety of backend engines: range scans over partitioned sorted maps. To evaluate these ideas, we implemented the LARA operators as range iterators in Apache Accumulo, a popular implementation of Googles BigTable. First we show how LARA expresses a sensor quality control task, and we measure the performance impact of optimizations LARA admits on this task. Second we show that the LARADB implementation outperforms Accumulos native MapReduce integration on a core task involving join and aggregation in the form of matrix multiply, especially at smaller scales that are typically a poor fit for scale-out approaches. We find that LARADB offers a conceptually lean framework for optimizing mixed-abstraction analytics tasks, without giving up fast record-level updates and scans.
ieee high performance extreme computing conference | 2016
Dylan Hutchison; Jeremy Kepner; Vijay Gadepally; Bill Howe
Google BigTables scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside a database versus an external system-in short, that memory requirements and relative I/O are critical factors.
ieee high performance extreme computing conference | 2016
Alexander Chen; Alan Edelman; Jeremy Kepner; Vijay Gadepally; Dylan Hutchison
Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation.
ieee high performance extreme computing conference | 2016
Timothy Weale; Vijay Gadepally; Dylan Hutchison; Jeremy Kepner
Graph algorithms have wide applicablity to a variety of domains and are often used on massive datasets. Recent standardization efforts such as the GraphBLAS specify a set of key computational kernels that hardware and software developers can adhere to. Graphulo is a processing framework that enables GraphBLAS kernels in the Apache Accumulo database. In our previous work, we have demonstrated a core Graphulo operation called TableMult that performs large-scale multiplication operations of database tables. In this article, we present the results of scaling the Graphulo engine to larger problems and scalablity when a greater number of resources is used. Specifically, we present two experiments that demonstrate Graphulo scaling performance is linear with the number of available resources. The first experiment demonstrates cluster processing rates through Graphulos TableMult operator on two large graphs, scaled between 217 and 219 vertices. The second experiment uses TableMult to extract a random set of rows from a large graph (219 nodes) to simulate a cued graph analytic. These benchmarking results are of relevance to Graphulo users who wish to apply Graphulo to their graph problems.
ieee high performance extreme computing conference | 2017
Dylan Hutchison
Triangle counting is a key algorithm for large graph analysis. The Graphulo library provides a framework for implementing graph algorithms on the Apache Accumulo distributed database. In this work we adapt two algorithms for counting triangles, one that uses the adjacency matrix and another that also uses the incidence matrix, to the Graphulo library for serverside processing inside Accumulo. Cloud-based experiments show a similar performance profile for these different approaches on the family of power law Graph500 graphs, for which data skew increasingly bottlenecks. These results motivate the design of skew-aware hybrid algorithms that we propose for future work.
ieee high performance extreme computing conference | 2017
Lauren Milechin; Vijay Gadepally; Siddharth Samsi; Jeremy Kepner; Alexander Chen; Dylan Hutchison
The D4M tool was developed to address many of todays data needs. This tool is used by hundreds of researchers to perform complex analytics on unstructured data. Over the past few years, the D4M toolbox has evolved to support connectivity with a variety of new database engines, including SciDB. D4M-Graphulo provides the ability to do graph analytics in the Apache Accumulo database. Finally, an implementation using the Julia programming language is also now available. In this article, we describe some of our latest additions to the D4M toolbox and our upcoming D4M 3.0 release. We show through benchmarking and scaling results that we can achieve fast SciDB ingest using the D4M-SciDB connector, that using Graphulo can enable graph algorithms on scales that can be memory limited, and that the Julia implementation of D4M achieves comparable performance or exceeds that of the existing MATLAB® implementation.
conference on innovative data systems research | 2017
Jingjing Wang; Tobin Baker; Magdalena Balazinska; Daniel Halperin; Brandon Haynes; Bill Howe; Dylan Hutchison; Shrainik Jain; Ryan Maas; Parmita Mehta; Dominik Moritz; Brandon Myers; Jennifer Ortiz; Dan Suciu; Andrew Whitaker; Shengliang Xu