James B. S. G. Greensky

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James B. S. G. Greensky is active.

Explore More

Publication

Featured researches published by James B. S. G. Greensky.

international conference on parallel processing | 2010

Starling: Minimizing Communication Overhead in Virtualized Computing Platforms Using Decentralized Affinity-Aware Migration

Jason D. Sonnek; James B. S. G. Greensky; Robert Reutiman; Abhishek Chandra

Virtualization is being widely used in large-scale computing environments, such as clouds, data centers, and grids, to provide application portability and facilitate resource multiplexing while retaining application isolation. In many existing virtualized platforms, it has been found that the network bandwidth often becomes the bottleneck resource, causing both high network contention and reduced performance for communication and data-intensive applications. In this paper, we present a decentralized affinity-aware migration technique that incorporates heterogeneity and dynamism in network topology and job communication patterns to allocate virtual machines on the available physical resources. Our technique monitors network affinity between pairs of VMs and uses a distributed bartering algorithm, coupled with migration, to dynamically adjust VM placement such that communication overhead is minimized. Our experimental results running the Intel MPI benchmark and a scientific application on a 7-node Xen cluster show that we can get up to 42% improvement in the runtime of the application over a no-migration technique, while achieving up to 85% reduction in network communication cost. In addition, our technique is able to adjust to dynamic variations in communication patterns and provides both good performance and low network contention with minimal overhead.

ACM Transactions on Architecture and Code Optimization | 2015

Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking

Zhenman Fang; Sanyam Mehta; Pen Chung Yew; Antonia Zhai; James B. S. G. Greensky; Gautham Beeraka; Binyu Zang

As multicore and many-core architectures evolve, their memory systems are becoming increasingly more complex. To bridge the latency and bandwidth gap between the processor and memory, they often use a mix of multilevel private/shared caches that are either blocking or nonblocking and are connected by high-speed network-on-chip. Moreover, they also incorporate hardware and software prefetching and simultaneous multithreading (SMT) to hide memory latency. On such multi- and many-core systems, to incorporate various memory optimization schemes using compiler optimizations and performance tuning techniques, it is crucial to have microarchitectural details of the target memory system. Unfortunately, such details are often unavailable from vendors, especially for newly released processors. In this article, we propose a novel microbenchmarking methodology based on short elapsed-time events (SETEs) to obtain comprehensive memory microarchitectural details in multi- and many-core processors. This approach requires detailed analysis of potential interfering factors that could affect the intended behavior of such memory systems. We lay out effective guidelines to control and mitigate those interfering factors. Taking the impact of SMT into consideration, our proposed methodology not only can measure traditional cache/memory latency and off-chip bandwidth but also can uncover the details of software and hardware prefetching units not attempted in previous studies. Using the newly released Intel Xeon Phi many-core processor (with in-order cores) as an example, we show how we can use a set of microbenchmarks to determine various microarchitectural features of its memory system (many are undocumented from vendors). To demonstrate the portability and validate the correctness of such a methodology, we use the well-documented Intel Sandy Bridge multicore processor (with out-of-order cores) as another example, where most data are available and can be validated. Moreover, to illustrate the usefulness of the measured data, we do a multistage coordinated data prefetching case study on both Xeon Phi and Sandy Bridge and show that by using the measured data, we can achieve 1.3X and 1.08X performance speedup, respectively, compared to the state-of-the-art Intel ICC compiler. We believe that these measurements also provide useful insights into memory optimization, analysis, and modeling of such multicore and many-core architectures.

international conference on conceptual structures | 2010

Boosting the performance of computational fluid dynamics codes for interactive supercomputing

Paul R. Woodward; Jagan Jayaraj; Pei-Hung Lin; Pen Chung Yew; Michael R. Knox; James B. S. G. Greensky; Anthony Nowatski; Karl Stoffels

An extreme form of pipelining of the Piecewise-Parabolic Method (PPM) gas dynamics code has been used to dramatically increase its performance on the new generation of multicore CPUs. Exploiting this technique, together with a full integration of the several data post-processing and visualization utilities associated with this code has enabled numerical experiments in computational fluid dynamics to be performed interactively on a new, dedicated system in our lab, with immediate, user controlled visualization of the resulting flows on the PowerWall display. The code restructuring required to achieve the necessary CPU performance boost, as well as the parallel computing methods and systems used to enable interactive flow simulation are described. Requirements for these techniques to be applied to other codes are discussed, and our plans for tools that will assist programmers to exploit these techniques are briefly described. Examples showing the capability of the new system and software are given for applications in turbulence and stellar convection.

international symposium on vlsi design, automation and test | 2014

Full system simulation framework for integrated CPU/GPU architecture

Po-Han Wang; Gen-Hong Liu; Jen-Chieh Yeh; Tse-Min Chen; Hsu-Yao Huang; Chia-Lin Yang; Shih-Lien Liu; James B. S. G. Greensky

The integrated CPU/GPU architecture brings performance advantage since the communication cost between the CPU and GPU is reduced, and also imposes new challenges in processor architecture design, especially in the management of shared memory resources, e.g, the last-level cache and memory bandwidth. Therefore, a micro-architecture level simulator is essential to facilitate researches in this direction. In this paper, we develop the first cycle-level full-system simulation framework for CPU-GPU integration with detailed memory models. With the simulation framework, we analyze the communication cost between the CPU and GPU for GPU workloads, and perform memory system characterization running both applications concurrently.

parallel computing | 2006

Interactive volume visualization of fluid flow simulation data

Paul R. Woodward; David H. Porter; James B. S. G. Greensky; Alex J. Larson; Michael R. Knox; James Hanson; Niranjay Ravindran; Tyler Fuchs

Recent development work at the Laboratory for Computational Science & Engineering (LCSE) at the University of Minnesota aimed at increasing the performance of parallel volume rendering of large fluid dynamics simulation data is reported. The goal of the work is interactive visual exploration of data sets that are up to two terabytes in size. A key system design feature in accelerating the rendering performance from such large data sets is replication of the data set on directly attached parallel disk systems at each rendering node. Adaptation of this system for interactive steering and visualization of fluid flow simulations as they run on remote supercomputer systems introduces special additional challenges which will briefly be described.

2013 Extreme Scaling Workshop (xsw 2013) | 2013

Scaling the Multifluid PPM Code on Blue Waters and Intel MIC

Paul R. Woodward; Jagan Jayaraj; Pei Hung Lin; Michael R. Knox; Simon D. Hammond; James B. S. G. Greensky; Sarah E. Anderson

Over the course of the last year, we have worked to adapt our multifluid PPM code to run well at scale on the Blue Waters machine at NCSA as well as on networks of Intel Xeon Phi coprocessors. The work on Blue Waters has been in collaboration with Cray and that with Intels MIC co-processors in collaboration with Intel. Our starting point for this work was a version of the code that was developed to run well at scale on the Los Alamos Roadrunner machine. We therefore began with an implementation that was designed to take advantage of heterogeneous processor systems. In this paper, we will discuss scaling issues encountered on Blue Waters as well as issues encountered with Intels MIC co-processors. We present the code structure that we developed in this work, beginning with its parallel implementation using heterogeneous MPI processes and proceeding to its parallel implementation on a single multi- or many-core CPU. We also present a sampling of results from a simulation on Blue Waters on a 1.18 trillion cell grid that ran at a sustained rate in 32-bit precision of 1.5 Pflop/s.

international symposium on visual computing | 2008

Ubiquitous Interactive Visualization of 3-D Mantle Convection through Web Applications Using Java

Jonathan C. Mc Lane; Wojciech Czech; David A. Yuen; Michael R. Knox; James B. S. G. Greensky; M. Charley Kameyama; Vincent M. Wheeler; Rahul Panday; Hiroki Senshu

We have designed a new system for real-time interactive visualization of results taken directly from large-scale simulations of 3-D mantle convection and other large-scale simulations. This approach allows for intense visualization sessions for a couple of hours as opposed to storing massive amounts of data in a storage system. Our data sets consist of 3-D data for volume rendering with sets ranging as high as over 10 million unknowns at each timestep. Large scale visualization on a display wall holding around 13 million pixels has already been accomplished with extension to hand-held devices, such as the OQO and Nokia N800. We are developing web-based software in Java to extend the use of this system across long distances. The software is aimed at creating an interactive and functional application capable of running on multiple browsers by taking advantage of two AJAX-enabled web frameworks: Echo2 and Google Web Toolkit.

International Review of Economics | 2008