Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jeffrey Young is active.

Publication


Featured researches published by Jeffrey Young.


european conference on parallel processing | 2016

GraphIn: An Online High Performance Incremental Graph Processing Framework

Dipanjan Sengupta; Narayanan Sundaram; Xia Zhu; Theodore L. Willke; Jeffrey Young; Matthew Wolf; Karsten Schwan

The massive explosion in social networks has led to a significant growth in graph analytics and specifically in dynamic, time-varying graphs. Most prior work processes dynamic graphs by first storing the updates and then repeatedly running static graph analytics on saved snapshots. To handle the extreme scale and fast evolution of real-world graphs, we propose a dynamic graph analytics framework, GraphIn, that incrementally processes graphs on-the-fly using fixed-sized batches of updates. As part of GraphIn, we propose a novel programming model called I-GAS based on gather-apply-scatter programming paradigm that allows for implementing a large set of incremental graph processing algorithms seamlessly across multiple CPU cores. We further propose a property-based, dual-path execution model to choose between incremental or static computation. Our experiments show that for a variety of graph inputs and algorithms, GraphIn achieves upi?źto 9.3 million updates/sec and over 400


ieee international conference on high performance computing data and analytics | 2012

Satisfying Data-Intensive Queries Using GPU Clusters

Jeffrey Young; Haicheng Wu; Sudhakar Yalamanchili


international conference on cluster computing | 2013

Oncilla: A GAS runtime for efficient resource allocation and data movement in accelerated clusters

Jeffrey Young; Se Hoon Shon; Sudhakar Yalamanchili; Alex Merritt; Karsten Schwan; Holger Fröning

\times


international conference on green computing | 2010

Dynamic Partitioned Global Address Spaces for power efficient DRAM virtualization

Jeffrey Young; Sudhakar Yalamanchili


Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications | 2015

A portable benchmark suite for highly parallel data intensive query processing

Ifrah Saeed; Jeffrey Young; Sudhakar Yalamanchili

speedup when compared to static graph recomputation.


ieee international conference on high performance computing data and analytics | 2012

Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds

Jeffrey Young; Sudhakar Yalamanchili

Data-intensive queries should be run on GPU clusters to increase throughput, and Global Address Spaces (GAS) should be used to support compiler optimizations that can increase total throughput by fully utilizing memory and GPUs across nodes in the cluster.


embedded and ubiquitous computing | 2004

FERP Interface and Interconnect Cores for Stream Processing Applications

Jeffrey Young; Ron Sass

Accelerated and in-core implementations of Big Data applications typically require large amounts of host and accelerator memory as well as efficient mechanisms for transferring data to and from accelerators in heterogeneous clusters. Scheduling for heterogeneous CPU and GPU clusters has been investigated in depth in the high-performance computing (HPC) and cloud computing arenas, but there has been less emphasis on the management of cluster resource that is required to schedule applications across multiple nodes and devices. Previous approaches to address this resource management problem have focused on either using low-performance software layers or on adapting complex data movement techniques from the HPC arena, which reduces performance and creates barriers for migrating applications to new heterogeneous cluster architectures. This work proposes a new system architecture for cluster resource allocation and data movement built around the concept of managed Global Address Spaces (GAS), or dynamically aggregated memory regions that span multiple nodes.We propose a software layer called Oncilla that uses a simple runtime and API to take advantage of non-coherent hardware support for GAS. The Oncilla runtime is evaluated using two different high-performance networks for microkernels representative of the TPC-H data warehousing benchmark, and this runtime enables a reduction in runtime of up to 81%, on average, when compared with standard disk-based data storage techniques. The use of the Oncilla API is also evaluated for a simple breadth-first search (BFS) benchmark to demonstrate how existing applications can incorporate support for managed GAS.


field programmable gate arrays | 2004

Online placement infrastructure to support run-time reconfiguration

Brian Hargrove Leonard; Jeffrey Young; Ron Sass

Dynamic Partitioned Global Address Spaces (DPGAS) is an abstraction that allows for quick and efficient remapping of physical memory addresses within a global address space, enabling more efficient sharing of remote DRAM. While past work has proposed several uses for DPGAS [1], the most pressing issue in todays data centers is reducing power. This work uses a detailed simulation infrastructure to study the effects of using DPGAS to reduce overall data center power through low-latency accesses to “virtual” DIMMs. Virtual DIMMs are remote DIMMs that can be mapped into a local nodes address space using existing operating system abstractions and low-level hardware support to abstract the DIMMs location from the application using it. By using a simple spill-receive memory allocation model, we show that DPGAS can reduce memory power from 18% to 49% with a hardware latency of 1 to 2 µs in typical usage scenarios. Additionally, we demonstrate the range of scenarios where DPGAS can be realized over a shared 10 Gbps Ethernet link with normal network traffic.


Proceedings of the International Symposium on Memory Systems | 2017

Evaluating hybrid memory cube infrastructure to support high-performance sparse algorithms

Kartikay Garg; Jeffrey Young

Traditionally, data warehousing workloads have been processed using CPU-focused clusters, such as those that make up the bulk of available machines in Amazons EC2, and the focus on improving analytics performance has been to utilize a homogenous, multi-threaded CPU environment with optimized algorithms for this infrastructure. The increasing availability of highly parallel accelerators, like the GPU and Xeon Phi discrete accelerators, in these types of clusters has provided an opportunity to further accelerate analytics operations but at a high programming cost due to optimizations required to fully utilize each of these new pieces of hardware. This work describes and analyzes highly parallel relational algebra primitives that are developed to focus on data warehousing queries through the use of a common OpenCL framework that can be executed both on standard multi-threaded processors and on emerging accelerator architectures. As part of this work, we propose a set of data-intensive benchmarks to help compare and differentiate the performance of accelerator hardware and to determine the key characteristics for efficiently running data warehousing queries on accelerators.


ieee high performance extreme computing conference | 2016

Optimizing communication for a 2D-partitioned scalable BFS

Jeffrey Young; Julian Romera; Matthias Hauck; Holger Fröning

Hardware support for Global Address Spaces (GAS) has previously focused on providing efficient access across remote memories, typically using custom interconnects or high-level software layers. New technologies, such as Extoll, HyperShare, and NumaConnect now allow for cheaper ways to build GAS support into the data center, thus making high-performance coherent and non-coherent remote memory access available for standard data center applications. At the same time, data center designers are currently experimenting with a greater use of accelerators like GPUs to enhance traditionally CPU-oriented processes, such as data warehousing queries for in-core databases. However, there are very few workable approaches for these accelerator clusters that both use commodity interconnects and also support simple multi-node programming models, such as GAS. We propose a new commodity-based approach for supporting non-coherent GAS in accelerator clouds using the HyperTransport Consortiums HyperTransport over Ethernet (HToE) specification. This work details a system model for using HToE for accelerated data warehousing applications and investigates potential bottlenecks and design optimizations for an HToE network adapter, or HyperTransport Ethernet Adapter (HTEA). Using a detailed network simulator model and timing measured for queries run on high-end GPUs, we find that the addition of wider deencapsulation pipelines and the use of bulk acknowledgments in the HTEA can improve overall throughput and reduce latency for multiple senders using a common accelerator. Furthermore, we show that the bandwidth of one receiving HTEA can vary from 2.8 Gbps to 24.45 Gbps, depending on the optimizations used, and the inter-HTEA latency for one packet is 1,480 ns. A brief analysis of the path from remote memory to accelerators also demonstrates that the bandwidth of todays GPUs can easily handle a stream-based computation model using HToE.

Collaboration


Dive into the Jeffrey Young's collaboration.

Top Co-Authors

Avatar

Sudhakar Yalamanchili

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Karsten Schwan

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Matthew Wolf

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ron Sass

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Anshuman Goswami

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Greg Eisenhauer

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeffrey S. Vetter

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kartikay Garg

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

M. Graham Lopez

Oak Ridge National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge