Steven A. Przybylski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven A. Przybylski is active.

Explore More

Publication

Featured researches published by Steven A. Przybylski.

international symposium on computer architecture | 1988

Performance tradeoffs in cache design

Steven A. Przybylski; Mark Horowitz; John L. Hennessy

Cache memories have become common across a wide range of computer implementations. To date, most analyses of cache performance have concentrated on time independent metrics, such as miss rate and traffic ratio. This paper presents a series of simulations that explore the interactions between various organizational decisions and program execution time. We investigate the tradeoffs between cache size and CPU/Cache cycle time, set associativity and cycle time, and between block size and main memory speed. The results indicate that neither cycle time nor cache size dominates the other across the entire design space. For common implementation technologies, performance is maximized when the size is increased to the 32KB to 128KB range with modest penalties to the cycle time. If set associativity impacts the cycle time by more than a few nanoseconds, it increases overall execution time. Since the block size and memory transfer rate combine to affect the cache miss penalty, the optimum block size is substantially smaller than that which minimizes the miss rate. Finally, the interdependence between optimal cache configuration and the main memory speed necessitates multi-level cache hierarchies for high performance uniprocessors.

IEEE Journal of Solid-state Circuits | 1992

An analytical access time model for on-chip cache memories

Tomohisa Wada; Suresh Rajan; Steven A. Przybylski

An analytical access time model for on-chip cache memories that shows the dependence of the cache access time on the cache parameters is described. The model includes general cache parameters, such as cache size (C), block size (B), and associativity (A), and array configuration parameters that are responsible for determining the subarray aspect ratio and the number of subarrays. With this model, a large cache design space can be covered, which cannot be done using only SPICE circuit simulation within a limited time. Using the model, it is shown that for given C, B, and A, optimum array configuration parameters can be used to minimize the access time; if the optimum array parameters are used, then the optimum access time is roughly proportional to the log (cache size), and when the optimum array parameters are used, larger block size gives smaller access time, but larger associativity does not give smaller access time because of the increase of the data-bus capacitances. >

IEEE Journal of Solid-state Circuits | 1987

MIPS-X: a 20-MIPS peak, 32-bit microprocessor with on-chip cache

Mark Horowitz; Paul Chow; Don Stark; Richard T. Simoni; Arturo Salz; Steven A. Przybylski; John L. Hennessy; Glenn Gulak; Anant Agarwal; John M. Acken

MIPS-X is a 32-b RISC microprocessor implemented in a conservative 2-/spl mu/m, two-level-metal, n-well CMOS technology. High performance is achieved by using a nonoverlapping two-phase 20-MHz clock and executing one instruction every cycle. To reduce its memory bandwidth requirements, MIPS-X includes a 2-kbyte on-chip instruction cache. The authors provide an overview of MIPS-X, focusing on the techniques used to reduce the complexity of the processor and implement the on-chip instruction cache.

international symposium on computer architecture | 1990

The performance impact of block sizes and fetch strategies

Steven A. Przybylski

This paper explores the interactions between a caches block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block size is almost always four or eight words [10]. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance-optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of two greater than the data cache fetch size, which in turn should equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated here, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.

international symposium on computer architecture | 1989

Characteristics Of Performance-Optimal Multi-level Cache Hierarchies

Steven A. Przybylski; Mark Horowitz; John L. Hennessy

The increasing speed of new generation processors will exacerbate the already large difference between CPU cycle times and main memory access times. As this difference grows, it will be increasingly difficult to build single-level caches that are both fast enough to match these fast cycle times and large enough to effectively hide the slow main memory access times. One solution to this problem is to use a multi-level cache hierarchy. This paper examines the relationship between cache organization and program execution time for multi-level caches. We show that a first-level cache dramatically reduces the number of references seen by a second-level cache, without having a large effect on the number of second-level cache misses. This reduction in the number of second-level cache hits changes the optimal design point by decreasing the importance of the cycle-time of the second-level cache relative to its size. The lower the first-level cache miss rate, the less important the second-level cycle time becomes. This change in relative importance of cycle time and miss rate makes associativity more attractive and increases the optimal cache size for second-level caches over what they would be for an equivalent single-level cache system.

international symposium on microarchitecture | 1982

MIPS: A microprocessor architecture

John L. Hennessy; Norman P. Jouppi; Steven A. Przybylski; Christopher Rowen; Thomas R. Gross; Forest Baskett; John Gill

MIPS is a new single chip VLSI microprocessor. It attempts to achieve high performance with the use of a simplified instruction set, similar to those found in microengines. The processor is a fast pipelined engine without pipeline interlocks. Software solutions to several traditional hardware problems, such as providing pipeline interlocks, are used.

Design of a high performance VLSI processor | 1983

Design of a high performance VLSI processor

John L. Hennessy; Norman P. Jouppi; Steven A. Przybylski; Christopher Rowen; Thomas R. Gross

Current VLSI fabrication technology makes it possible to design a 32-bit CPU on a single chip. However, to achieve high performance from that processor, the architecture and implementation must be carefully designed and tuned. The MIPS processor incorporates some new architectural ideas into a single-chip, nMOS implementation. Processor performance is obtained by the careful integration of the software (e.g., compilers), the architecture, and the hardware implementation. This integrated view also simplifies the design, making it practical to implement the processor at a university.

ACM Transactions on Computer Systems | 1988

Measurement and evaluation of the MIPS architecture and processor

Thomas R. Gross; John L. Hennessy; Steven A. Przybylski; Christopher Rowen

MIPS is a 32-bit processor architecture that has been implemented as an nMOS VLSI chip. The instruction set architecture is RISC-based. Close coupling with compilers and efficient use of the instruction set by compiled programs were goals of the architecture. The MIPS architecture requires that the software implement some constraints in the design that are normally considered part of the hardware implementation. This paper presents experimental results on the effectiveness of this processor as a program host. Using sets of large and small benchmarks, the instruction and operand usage patterns are examined both for optimized and unoptimized code. Several of the architectural and organizational innovations in MIPS, including software pipeline scheduling, multiple-operation instructions, and word-based addressing, are examined in light of this data.

international solid-state circuits conference | 1987

A 32b microprocessor with on-chip 2Kbyte instruction cache

Mark Horowitz; John L. Hennessy; Paul Chow; P.G. Gulak; John M. Acken; Anant Agarwal; Chorng-Yeung Chu; S. McFarling; Steven A. Przybylski; S. Richardson; Arturo Salz; Richard T. Simoni; Don Stark; Peter Steenkiste; Steven W. K. Tjiang; M. Wing

A Reduced Instruction Set Computer with a 5-stage pipeline implemented with 150K transistors on an 8mm×8.5mm chip in a 2μm, 2 layer metal CMOS process, will be reported. At operational frequency of 20MHz, a 12MIPS performance has been achieved.

VLSI Electronics Microstructure Science | 1986

Chapter 1 - VLSI Processor Design Methodology*

John L. Hennessy; Steven A. Przybylski

Publisher Summary Integrated Circuit (IC) technology has made the production of chips with several transistors possible. Systems of such complexity are difficult to design. The computer architect faces problems in the areas of system partitioning with sub goal specification, subsystems interface specification and verification, and overall system integration. This improvement in IC technology allows the fabrication of processors with complexity, comparable to the largest mainframe computers designed using off-the-shelf technologies (SSI, MSI, and LSI). The advent of very large scale integrated (VLSI) processor has significantly changed the way in which computers are designed and implemented. This chapter discusses the use of VLSI as an implementation medium and it focuses on the design of general purpose microprocessors. In many ways, the architecture and organization of a VLSI processor are similar to the designs used in the CPUs of modern machines implemented by using standard parts. The MOS technology imposes some new constraints that emphasize on the interaction between architecture and implementation. The chapter discusses the issues that arise in determining the suitability of architecture as a program host, the implications of the architecture on the organization, and some guidelines to help evaluate the suitability of architecture both for an application environment and for implementation using VLSI.

Explore More