Marc S. Orr
Advanced Micro Devices
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marc S. Orr.
Proceedings of the ACM Workshop on High Performance Graph Processing | 2016
Shuai Che; Marc S. Orr; Gregory Rodgers; Jonathan Gallmeier
This paper studies different approaches to implementing betweenness centrality in a heterogeneous system. Betweenness centrality is an important algorithm in graph processing. It presents multiple levels of parallelism when processing a graph, and is an interesting problem to exploit various optimizations. We implement different versions of betweenness centrality on an AMD accelerated processing unit (APU). These include GPU-only implementations with two edge distribution methods, GPU-side load balancing, CPU-GPU load balancing in a master-worker model with queue monitoring and in a work stealing model. We take advantage of the latest development of heterogeneous system architecture (HSA), such as the features of unified virtual address space and diverse atomics. We also use different memory scope and ordering options for different synchronization scenarios. We compare multiple implementations of betweenness centrality, analyze their performance, and discuss important future research directions.
computing frontiers | 2017
Shuai Che; Marc S. Orr; Jonathan Gallmeier
This paper uses betweenness centrality as a case study to research efficient work stealing in a heterogeneous system environment. Betweenness centrality is an important algorithm in graph processing. It presents multiple-level parallelism and is an interesting problem to exploit various optimizations. We investigate queue-based work stealing to distribute its tasks across GPU compute units (CUs) and across the CPU and the GPU, which has not been done by prior work. In particular, we demonstrate how to leverage the new platform-atomic operations on AMD Accelerated Processing Units (APUs) to operate cross-device queues in a lock-free manner in shared virtual memory. To make the work stealing runtime and the application more efficient, we apply new architectural features, including atomic operations with different memory scopes and or-derings for different synchronization scenarios. We implement our solution using heterogeneous system architecture (HSA). Our results show that betweenness centrality with CPU-GPU work stealing achieves an average of 15% (up to 30%) performance improvement over GPU-only execution for diverse graph inputs. Our work stealing solution can be applied widely to other applications too. Finally, we analyze important parameters critical for queuing and stealing.
Archive | 2013
Steven K. Reinhardt; Marc S. Orr; Bradford M. Beckmann
Archive | 2015
Shuai Che; Bradford M. Beckmann; Marc S. Orr; Ayse Yilmazer
Archive | 2016
Marc S. Orr; Bradford M. Beckmann; Ayse Yilmazer; Shuai Che; David A. Wood; Mark D. Hill
Archive | 2016
Steven K. Reinhardt; Marc S. Orr; Bradford M. Beckmann; Shuai Che; David A. Wood
Archive | 2015
Yasuko Eckert; Derek R. Hower; Marc S. Orr
Archive | 2015
David A. Wood; Steven K. Reinhardt; Bradford M. Beckmann; Marc S. Orr
Archive | 2015
Shuai Che; Bradford M. Beckmann; Marc S. Orr; Ayse Yilmazer
Archive | 2014
Marc S. Orr; Bradford M. Beckmann; Benedict R. Gaster; Steven K. Reinhardt; David A. Wood