Marc S. Orr | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc S. Orr is active.

Explore More

Publication

Featured researches published by Marc S. Orr.

Proceedings of the ACM Workshop on High Performance Graph Processing | 2016

Betweenness Centrality in an HSA-enabled System

Shuai Che; Marc S. Orr; Gregory Rodgers; Jonathan Gallmeier

This paper studies different approaches to implementing betweenness centrality in a heterogeneous system. Betweenness centrality is an important algorithm in graph processing. It presents multiple levels of parallelism when processing a graph, and is an interesting problem to exploit various optimizations. We implement different versions of betweenness centrality on an AMD accelerated processing unit (APU). These include GPU-only implementations with two edge distribution methods, GPU-side load balancing, CPU-GPU load balancing in a master-worker model with queue monitoring and in a work stealing model. We take advantage of the latest development of heterogeneous system architecture (HSA), such as the features of unified virtual address space and diverse atomics. We also use different memory scope and ordering options for different synchronization scenarios. We compare multiple implementations of betweenness centrality, analyze their performance, and discuss important future research directions.

computing frontiers | 2017

Work Stealing in a Shared Virtual-Memory Heterogeneous Environment: A Case Study with Betweenness Centrality

Shuai Che; Marc S. Orr; Jonathan Gallmeier

This paper uses betweenness centrality as a case study to research efficient work stealing in a heterogeneous system environment. Betweenness centrality is an important algorithm in graph processing. It presents multiple-level parallelism and is an interesting problem to exploit various optimizations. We investigate queue-based work stealing to distribute its tasks across GPU compute units (CUs) and across the CPU and the GPU, which has not been done by prior work. In particular, we demonstrate how to leverage the new platform-atomic operations on AMD Accelerated Processing Units (APUs) to operate cross-device queues in a lock-free manner in shared virtual memory. To make the work stealing runtime and the application more efficient, we apply new architectural features, including atomic operations with different memory scopes and or-derings for different synchronization scenarios. We implement our solution using heterogeneous system architecture (HSA). Our results show that betweenness centrality with CPU-GPU work stealing achieves an average of 15% (up to 30%) performance improvement over GPU-only execution for diverse graph inputs. Our work stealing solution can be applied widely to other applications too. Finally, we analyze important parameters critical for queuing and stealing.

Archive | 2013