Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Kistler is active.

Publication


Featured researches published by Michael Kistler.


dependable systems and networks | 2002

Modeling the effect of technology trends on the soft error rate of combinational logic

Premkishore Shivakumar; Michael Kistler; Stephen W. Keckler; Doug Burger; Lorenzo Alvisi

This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs. The model captures the effects of two important masking phenomena, electrical masking and latching-window masking, which inhibit soft errors in combinational logic. We quantify the SER due to high-energy neutrons in SRAM cells, latches, and logic circuits for feature sizes from 600 nm to 50 nm and clock periods from 16 to 6 fan-out-of-4 inverter delays. Our model predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SER per chip of unprotected memory elements. Our result emphasizes that computer system designers must address the risks of soft errors in logic circuits for future designs.


IEEE Computer | 2003

Energy management for commercial servers

Charles R. Lefurgy; Karthick Rajamani; Freeman L. Rawson; Wesley M. Felter; Michael Kistler; Tom W. Keller

Servers: high-end, multiprocessor systems running commercial workloads, have typically included extensive cooling systems and resided in custom-built rooms for high-power delivery. Recently, as transistor density and demand for computing resources have rapidly increased, even these high-end systems face energy-use constraints. Commercial-server energy management now focuses on conserving power in the memory and microprocessor subsystems. Because their workloads are typically structured as multiple application programs, system-wide approaches are more applicable to multiprocessor environments in commercial servers than techniques that primarily apply to single-application environments, such as those based on compiler optimizations.


international symposium on microarchitecture | 2006

Cell Multiprocessor Communication Network: Built for Speed

Michael Kistler; Michael P. Perrone; Fabrizio Petrini

Multicore designs promise various power-performance and area-performance benefits. But inadequate design of the on-chip communication network can deprive applications of these benefits. To illuminate this important point in multicore processor design, the authors analyze the cell processors communication network, using a series of benchmarks involving various DMA traffic patterns and synchronization protocols


international parallel and distributed processing symposium | 2007

Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications

Vipin Sachdeva; Michael Kistler; Evan Speight; Tzy-Hwa Kathy Tzeng

This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine recently developed at IBM. In particular we focus on two highly popular bioinformatics applications - FASTA and ClustalW. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell processing platform. The price and power advantages afforded by the Cell processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware.


ACM Transactions on Mathematical Software | 2016

The BLIS Framework: Experiments in Portability

Field G. Van Zee; Tyler M. Smith; Bryan Marker; Tze Meng Low; Robert A. van de Geijn; Francisco D. Igual; Mikhail Smelyanskiy; Xianyi Zhang; Michael Kistler; Vernon Austel; John A. Gunnels; Lee Killough

BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD’s ACML, IBM’s ESSL, and Intel’s MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework’s leverage extends to the multithreaded domain.


parallel computing | 2008

Exploring the viability of the Cell Broadband Engine for bioinformatics applications

Vipin Sachdeva; Michael Kistler; Evan Speight; Tzy-Hwa Kathy Tzeng

This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine recently developed at IBM. In particular we focus on two highly popular bioinformatics applications - FASTA and ClustalW. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell processing platform. The price and power advantages afforded by the Cell processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware.


Ibm Journal of Research and Development | 2006

Application of full-system simulation in exploratory system design and development

James L. Peterson; Patrick J. Bohrer; Liqun Chen; Elmootazbellah Nabil Elnozahy; Ahmed Gheith; Richard H. Jewell; Michael Kistler; T. R. Maeurer; Sean A. Malone; David B. Murrell; Neena Needel; Karthick Rajamani; Mark Anthony Rinaldi; Richard O. Simpson; Kartik Sudeep; Lixin Zhang

This paper describes the design and application of a full-system simulation environment that has been widely used in the exploration of the IBM PowerPC® processor and system design. The IBM full-system simulator has been developed to meet the needs of hardware and software designers for fast, accurate, execution-driven simulation of complete systems, incorporating parameterized architectural models. This environment enables the development and tuning of production-level operating systems, compilers, and critical software support well in advance of hardware availability, which can significantly shorten the critical path of system development. The ability to develop early versions of software can benefit hardware development by identifying design issues that may affect functionality and performance far earlier in the development cycle, when they are much less costly to correct. In this paper, we describe features of the simulation environment and present examples of its application in the context of the Sony-Toshiba-IBM Cell Broadband EngineTM and IBM PERCS development projects.


Ibm Journal of Research and Development | 2009

Programming the Linpack benchmark for Roadrunner

Michael Kistler; John A. Gunnels; Daniel Alan Brokenshire; Brad Benton

We describe the challenges and opportunities we encountered when developing a hybrid version of the Linpack benchmark for the Los Alamos National Laboratory Roadrunner supercomputing system, which combines traditional x86-64 host processors with IBM PowerXCell™ 8i accelerator processors. The challenges included determining the proper division of the host and accelerator roles in the computation, transfer of data between the host and accelerator memory domains, alignment of data for communication and computation, and data format differences between the two processors. We also describe our approach to modeling the performance of the hybrid system and compare our performance estimates to witnessed performance on the system at different scales and levels of memory consumption. Through careful attention to these issues, we have produced a hybrid version of the Linpack benchmark for the Roadrunner system that achieves 77.8% of peak performance on a single compute node and 74.6% of peak performance over the entire system, making this system the first to achieve a Linpack result exceeding one petaflops (1015 floating-point operations per second).


Ibm Journal of Research and Development | 2016

IBM Bluemix Mobile Cloud Services

Ahmed Gheith; Ramakrishnan Rajamony; Patrick J. Bohrer; Kanak B. Agarwal; Michael Kistler; B. L. White Eagle; C. A. Hambridge; John B. Carter; Todd E. Kaplinger

The Mobile Cloud Services offering of IBM Bluemix® is a platform for cloud-based mobile applications, providing data and file storage, application authentication, push notifications, and server-side application logic, all available through easy-to-use client software development kits (SDKs). In this paper, we describe the server-side architecture for the key components of the Mobile Cloud Services. For scalability and fault resilience, components are implemented as stateless services that communicate using a distributed message queue. We adopted a “design for failure” approach to all environmental services, including basic networking support. We developed a robust communications layer that adds timeout and retry logic to all external interactions. We also built a flexible and robust application-monitoring infrastructure to constantly probe the service components, test end-to-end functionality, and report any problems through web monitors and text messages. Finally, we designed and delivered client SDKs for Android®, iOS®, and JavaScript® that enable application developers to quickly create robust mobile applications that utilize IBM Mobile Cloud Services. These architecture and implementation choices have resulted in a robust and scalable cloud-based platform for mobile application developers.


ieee international conference on high performance computing data and analytics | 2009

Programming the Linpack benchmark for the IBM PowerXCell 8i processor

Michael Kistler; John A. Gunnels; Daniel Alan Brokenshire; Brad Benton

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i 1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™ 2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.

Researchain Logo
Decentralizing Knowledge