Robert W. Wisniewski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert W. Wisniewski is active.

Explore More

Publication

Featured researches published by Robert W. Wisniewski.

ieee international conference on high performance computing data and analytics | 2011

The International Exascale Software Project roadmap

Jack J. Dongarra; Pete Beckman; Terry Moore; Patrick Aerts; Giovanni Aloisio; Jean Claude Andre; David Barkai; Jean Yves Berthou; Taisuke Boku; Bertrand Braunschweig; Franck Cappello; Barbara M. Chapman; Xuebin Chi; Alok N. Choudhary; Sudip S. Dosanjh; Thom H. Dunning; Sandro Fiore; Al Geist; Bill Gropp; Robert J. Harrison; Mark Hereld; Michael A. Heroux; Adolfy Hoisie; Koh Hotta; Zhong Jin; Yutaka Ishikawa; Fred Johnson; Sanjay Kale; R.D. Kenway; David E. Keyes

Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual petascale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/ exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and graphics processing units. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.

international symposium on microarchitecture | 2012

The IBM Blue Gene/Q Compute Chip

Ruud A. Haring; Martin Ohmacht; Thomas W. Fox; Michael Karl Gschwind; David L. Satterfield; Krishnan Sugavanam; Paul W. Coteus; Philip Heidelberger; Matthias A. Blumrich; Robert W. Wisniewski; Alan Gara; George Liang-Tai Chiu; Peter A. Boyle; Norman H. Chist; Changhoan Kim

Blue Gene/Q aims to build a massively parallel high-performance computing system out of power-efficient processor chips, resulting in power-efficient, cost-efficient, and floor-space- efficient systems. Focusing on reliability during design helps with scaling to large systems and lowers the total cost of ownership. This article examines the architecture and design of the Compute chip, which combines processors, memory, and communication functions on a single chip.

ieee international conference on high performance computing data and analytics | 2014

Addressing failures in exascale computing

Marc Snir; Robert W. Wisniewski; Jacob A. Abraham; Sarita V. Adve; Saurabh Bagchi; Pavan Balaji; Jim Belak; Pradip Bose; Franck Cappello; Bill Carlson; Andrew A. Chien; Paul W. Coteus; Nathan DeBardeleben; Pedro C. Diniz; Christian Engelmann; Mattan Erez; Saverio Fazzari; Al Geist; Rinku Gupta; Fred Johnson; Sriram Krishnamoorthy; Sven Leyffer; Dean A. Liberty; Subhasish Mitra; Todd S. Munson; Rob Schreiber; Jon Stearley; Eric Van Hensbergen

We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.

european conference on computer systems | 2006

K42: building a complete operating system

Orran Krieger; Marc A. Auslander; Bryan S. Rosenburg; Robert W. Wisniewski; Jimi Xenidis; Dilma Da Silva; Michal Ostrowski; Jonathan Appavoo; Maria A. Butrico; Mark F. Mergen; Amos Waterland; Volkmar Uhlig

K42 is one of the few recent research projects that is examining operating system design structure issues in the context of new whole-system design. K42 is open source and was designed from the ground up to perform well and to be scalable, customizable, and maintainable. The project was begun in 1996 by a team at IBM Research. Over the last nine years there has been a development effort on K42 from between six to twenty researchers and developers across IBM, collaborating universities, and national laboratories. K42 supports the Linux API and ABI, and is able to run unmodified Linux applications and libraries. The approach we took in K42 to achieve scalability and customizability has been successful.The project has produced positive research results, has resulted in contributions to Linux and the Xen hypervisor on Power, and continues to be a rich platform for exploring system software technology. Today, K42, is one of the key exploratory platforms in the DOEs FAST-OS program, is being used as a prototyping vehicle in IBMs PERCS project, and is being used by universities and national labs for exploratory research. In this paper, we provide insight into building an entire system by discussing the motivation and history of K42, describing its fundamental technologies, and presenting an overview of the research directions we have been pursuing.

international parallel and distributed processing symposium | 2007

Scale-up x Scale-out: A Case Study using Nutch/Lucene

Maged M. Michael; José E. Moreira; Doron Shiloach; Robert W. Wisniewski

Scale-up solutions in the form of large SMPs have represented the mainstream of commercial computing for the past several years. The major server vendors continue to provide increasingly larger and more powerful machines. More recently, scale-out solutions, in the form of clusters of smaller machines, have gained increased acceptance for commercial computing. Scale-out solutions are particularly effective in high-throughput Web-centric applications. In this paper, we investigate the behavior of two competing approaches to parallelism, scale-up and scale-out, in an emerging search application. Our conclusions show that a scale-out strategy can be the key to good performance even on a scale-up machine. Furthermore, scale-out solutions offer better price/performance, although at an increase in management complexity.

international conference on supercomputing | 2005

Online performance analysis by statistical sampling of microprocessor performance counters

Reza Azimi; Michael Stumm; Robert W. Wisniewski

Hardware performance counters (HPCs) are increasingly being used to analyze performance and identify the causes of performance bottlenecks. However, HPCs are difficult to use for several reasons. Microprocessors do not provide enough counters to simultaneously monitor the many different types of events needed to form an over-all understanding of performance. Moreover, HPCs primarily count low-level micro-architectural events from which it is difficult to extract high-level insight required for identifying causes of performance problems.We describe two techniques that help overcome these difficulties, allowing HPCs to be used in dynamic real-time optimizers. First, statistical sampling is used to dynamically multiplex HPCs and make a larger set of logical HPCs available. Using real programs, we show experimentally that it is possible through this sampling to obtain counts of hardware events that are statistically similar (within 15%) to complete non-sampled counts, thus allowing us to provide a much larger set of logical HPCs. Second, we observe that stall cycles are a primary source of inefficiencies, and hence they should be major targets for software optimization. Based on this observation, we build a simple model in real-time that speculatively associates each stall cycle to a processor component that likely caused the stall. The information needed to produce this model is obtained using our HPC multiplexing facility to monitor a large number of hardware components simultaneously. Our analysis shows that even in an out-of-order superscalar micro-processor such a speculative approach yields a fairly accurate model with run-time overhead for collection and computation of under 2%.These results demonstrate that we can effective analyze on-line performance of application and system code running at full speed. The stall analysis shows where performance is being lost on a given processor.

virtual execution environments | 2007

Libra: a library operating system for a jvm in a virtualized execution environment

Glenn Ammons; Jonathan Appavoo; Maria A. Butrico; Dilma Da Silva; David Grove; Kiyokuni Kawachiya; Orran Krieger; Bryan S. Rosenburg; Eric Van Hensbergen; Robert W. Wisniewski

If the operating system could be specialized for every application, many applications would run faster. For example, Java virtual machines (JVMs) provide their own threading model and memory protection, so general-purpose operating system implementations of these abstractions are redundant. However, traditional means of transforming existing systems into specialized systems are difficult to adopt because they require replacing the entire operating system. This paper describes Libra, an execution environment specialized for IBMs J9 JVM. Libra does not replace the entire operating system. Instead, Libra and J9 form a single statically-linked image that runs in a hypervisor partition. Libra provides the services necessary to achieve good performance for the Java workloads of interest but relies on an instance of Linux in another hypervisor partition to provide a networking stack, a filesystem, and other services. The expense of remote calls is offset by the fact that Libras services can be customized for a particular workload; for example, on the Nutch search engine, we show that two simple customizations improve application throughput by a factor of 2.7.

ieee international conference on high performance computing data and analytics | 2010

Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK

Mark E. Giampapa; Thomas M. Gooding; Todd Inglett; Robert W. Wisniewski

The Petascale era has recently been ushered in and many researchers have already turned their attention to the challenges of exascale computing. To achieve petascale computing two broad approaches for kernels were taken, a lightweight approach embodied by IBM Blue Genes CNK, and a more fullweight approach embodied by Crays CNL. There are strengths and weaknesses to each approach. Examining the current generation can provide insight as to what mechanisms may be needed for the exascale generation. The contributions of this paper are the experiences we had with CNK on Blue Gene/P. We demonstrate it is possible to implement a small lightweight kernel that scales well but still provides a Linux environment and functionality desired by HPC programmers. Such an approach provides the values of reproducibility, low noise, high and stable performance, reliability, and ease of effectively exploiting unique hardware features. We describe the strengths and weaknesses of this approach.

international conference on parallel architectures and compilation techniques | 2005

Multiple page size modeling and optimization

Calin Cascaval; Evelyn Duesterwald; Peter F. Sweeney; Robert W. Wisniewski

With the growing awareness that individual hardware cores will not continue to produce the same level of performance improvement, there is a need to develop an integrated approach to performance optimization. In this paper we present a paradigm for continuous program optimization (CPO), whereby automatic agents monitor and optimize application and system performance. The monitoring data is used to analyze and create models of application and system behavior. Using this analysis, we describe how CPO agents can improve the performance of both the application and the underlying system. Using the CPO paradigm, we implemented cooperating page size optimization agents that automatically optimize large page usage. An offline agent uses vertically integrated performance data to produce a page size benefit analysis for different categories of data structures within an application. We show how an online CPO agent can use the results of the predictive analysis to automatically improve application performance. We validate that the predictions made by the CPO agent reflect the actual performance gains of up to 60% across a range of scientific applications including the SPEC-cpu2000 floating point benchmarks and two large high performance computing (HPC) applications.

ACM Transactions on Computer Systems | 2007

Experience distributing objects in an SMMP OS

Jonathan Appavoo; Dilma Da Silva; Orran Krieger; Marc A. Auslander; Michal Ostrowski; Bryan S. Rosenburg; Amos Waterland; Robert W. Wisniewski; Jimi Xenidis; Michael Stumm; Livio Soares

Designing and implementing system software so that it scales well on shared-memory multiprocessors (SMMPs) has proven to be surprisingly challenging. To improve scalability, most designers to date have focused on concurrency by iteratively eliminating the need for locks and reducing lock contention. However, our experience indicates that locality is just as, if not more, important and that focusing on locality ultimately leads to a more scalable system. In this paper, we describe a methodology and a framework for constructing system software structured for locality, exploiting techniques similar to those used in distributed systems. Specifically, we found two techniques to be effective in improving scalability of SMMP operating systems: (i) an object-oriented structure that minimizes sharing by providing a natural mapping from independent requests to independent code paths and data structures, and (ii) the selective partitioning, distribution, and replication of object implementations in order to improve locality. We describe concrete examples of distributed objects and our experience implementing them. We demonstrate that the distributed implementations improve the scalability of operating-system-intensive parallel workloads.

Explore More