Bart Blaner | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bart Blaner is active.

Explore More

Publication

Featured researches published by Bart Blaner.

Ibm Journal of Research and Development | 2011

IBM POWER7 multicore server processor

Balaram Sinharoy; Ronald Nick Kalla; William J. Starke; Hung Q. Le; R. Cargnoni; J. A. Van Norstrand; B. J. Ronchetti; Jeffrey A. Stuecheli; Jens Leenstra; G. L. Guthrie; D. Q. Nguyen; Bart Blaner; C. F. Marino; E. Retter; Peter Williams

The IBM POWER® processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years. In this paper, we describe the key features of the POWER7® processor chip. On the chip is an eight-core processor, with each core capable of four-way simultaneous multithreaded operation. Fabricated in IBMs 45-nm silicon-on-insulator (SOI) technology with 11 levels of metal, the chip contains more than one billion transistors. The processor core and caches are significantly enhanced to boost the performance of both single-threaded response-time-oriented, as well as multithreaded, throughput-oriented applications. The memory subsystem contains three levels of on-chip cache, with SOI embedded dynamic random access memory (DRAM) devices used as the last level of cache. A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed

Ibm Journal of Research and Development | 2015

CAPI: A Coherent Accelerator Processor Interface

Jeffrey A. Stuecheli; Bart Blaner; Charles Ray Johns; Michael S. Siegel

Heterogeneous computing systems combine different types of compute elements that share memory. A specific class of heterogeneous systems discussed in this paper pairs traditional general-purpose processing cores and accelerator units. While this arrangement enables significant gains in application performance, device driver overheads and operating system code path overheads can become prohibitive. The I/O interface of a processor chip is a well-suited attachment point from a system design perspective, in that standard server models can be augmented with application-specific accelerators. However, traditional I/O attachment protocols introduce significant device driver and operating system software latencies. With the Coherent Accelerator Processor Interface (CAPI), we enable attaching an accelerator as a coherent CPU peer over the I/O physical interface. The CPU peer features consist of a homogeneous virtual address space across the CPU and accelerator, and hardware-managed caching of this shared data on the I/O device. This attachment method greatly increases the opportunities for acceleration due to the much shorter software path length required to enable its use compared to a traditional I/O model.

Ibm Journal of Research and Development | 1994

SCISM: a scalable compound instruction set machine

Stamatis Vassiliadis; Bart Blaner; Richard J. Eickemeyer

In this paper we describe a machine organization suitable for RISC and CISC architectures. The proposed organization reduces hardware complexity in parallel instruction fetch and issue logic by minimizing possible increases in cycle time caused by parallel instruction issue decisions in the instruction buffer. Furthermore, it improves instruction-level parallelism by means of special features. The improvements are achieved by analyzing instruction sequences and deciding which instructions will issue and execute in parallel prior to actual instruction fetch and issue, by incorporating preprocessed information for parallel issue and execution of instructions in the cache, by categorizing instructions for parallel issue and execution on the basis of hardware utilization rather than opcode description, by attempting to avoid memory interlocks through the preprocessing mechanism, and by eliminating execution interlocks with specialized hardware. Introduction Improvements in the performance of computer systems relate to circuit-level or technology improvements and to organizational techniques such as pipelining, cache memories, out-of-order execution, multiple functional units, and exploitation of instruction-level parallelism. One increasingly popular approach for exploiting instructionlevel parallelism, i.e., allowing multiple instructions to be issued and executed in one machine cycle, is the so-called superscalar machine organization [1]. A number of such machines with varying degrees of parallelism have recently been described [2, 3]. The increasing popularity of superscalar machine organizations may be attributed to the increased instruction execution rate such systems may offer, concomitant with technology improvements that have made their organizations more feasible.

Ibm Journal of Research and Development | 2015

The cache and memory subsystems of the IBM POWER8 processor

William J. Starke; Jeffrey A. Stuecheli; David Daly; John Steven Dodson; Florian A. Auernhammer; Patricia M. Sagmeister; Guy Lynn Guthrie; Charles F. Marino; Michael S. Siegel; Bart Blaner

In this paper, we describe the IBM POWER8™ cache, interconnect, memory, and input/output subsystems, collectively referred to as the “nest.” This paper focuses on the enhancements made to the nest to achieve balanced and scalable designs, ranging from small 12-core single-socket systems, up to large 16-processor-socket, 192-core enterprise rack servers. A key aspect of the design has been increasing the end-to-end data and coherence bandwidth of the system, now featuring more than twice the bandwidth of the POWER7® processor. The paper describes the new memory-buffer chip, called Centaur, providing up to 128 MB of eDRAM (embedded dynamic random-access memory) buffer cache per processor, along with an improved DRAM (dynamic random-access memory) scheduler with support for prefetch and write optimizations, providing industry-leading memory bandwidth combined with low memory latency. It also describes new coherence-transport enhancements and the transition to directly integrated PCIe® (PCI Express®) support, as well as additions to the cache subsystem to support higher levels of virtualization and scalability including snoop filtering and cache sharing.

ACM Sigarch Computer Architecture News | 1992

On the attributes of the SCISM organization

Stamatis Vassiliadis; Bart Blaner; Richard J. Eickemeyer

In this paper, we describe some of the attributes of the SCISM organization, a multiple instruction-issuing machine, the outcome of five years of research at the IBM Glendale Laboratory, in Endicott, New York. The proposed organization embodies a number of mechanisms, including the analysis of instruction sequences and deciding which instructions will execute in parallel prior to instruction fetch and issue, the incorporation of permanent preprocessing of instructions to be executed in parallel, the categorization of instructions for parallel execution on the basis of hardware utilization rather than opcode description, the avoidance of memory interlocks through the preprocessing mechanism, and the elimination of execution interlocks with specialized hardware. It is shown that by incorporating these mechanisms, a SCISM capable of issuing and executing two instructions per cycle can achieve more than 90% of the theoretical maximum performance of an idealized, dual instruction issue superscalar machine.

Ibm Journal of Research and Development | 2013

IBM POWER7+ processor on-chip accelerators for cryptography and active memory expansion

Bart Blaner; Bulent Abali; Brian Mitchell Bass; Suresh Chari; Ronald Nick Kalla; Steven R. Kunkel; Kenneth A. Lauricella; Ross Boyd Leavens; John J. Reilly; Peter A. Sandon

With the heightened focus on computer security, IBM POWER® server workloads are spending an increasing number of cycles performing cryptographic functions. Active memory expansion (AME), a technology to dynamically increase the effective memory capacity of a system by compressing and decompressing memory pages, is also enjoying increasing deployment in POWER server systems. Together, cryptography and AME consume enough central processing unit (CPU) cycles in a typical installation to warrant adding dedicated hardware accelerators on the processor chip to offload the compute-intensive parts of these functions from the processor cores. IBM POWER7+™ is the first POWER server to include on-chip hardware accelerators for symmetric (shared key) and asymmetric (public key) cryptography and memory compression/decompression for AME. A true random number generator (RNG) is also integrated on-chip. This paper describes the hardware accelerator framework, including location relative to the cores and memory, accelerator invocation, data movement, and error handling. A description of each type of accelerator follows, including details of supported algorithms and the corresponding hardware data flows. Algorithms supported include the Advanced Encryption Standard, Secure Hash Algorithm, and Message Digest 5 algorithm as bulk cryptographic functions; asymmetric cryptographic functions in support of RSA and elliptic curve cryptography; and a novel dictionary-based compression algorithm with high throughput supporting AME. A presentation of accelerator performance is included.

Wescon/98. Conference Proceedings (Cat. No.98CH36265) | 1998

RTL design source management for system-on-a-chip designs

Bart Blaner; Christine King; Paul C. Stabler

Increasingly, system-on-a-chip (SOC) designers are looking to leverage the expertise of design service providers to bring products to market. Design service providers may be engaged at various points in the design process: from design conception and specification to processing a completed technology dependent netlist. Engagement typically occurs at a stage intermediate to these points; completed RTL for some portion of the design may exist but assistance is needed to integrate the RTL with other core IF, or design services may be enlisted to develop some portion of the RTL or new cores to perform specific functions. Proper management of RTL design source is critical to the overall success of a project. This paper describes a methodology used by a design service provider to address this issue.

Ibm Journal of Research and Development | 1994