Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bret R. Olszewski.
architectural support for programming languages and operating systems | 1994
Ann M. Maynard; Colette Mary Donnelly; Bret R. Olszewski
Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user commercial workloads. This paper presents research, using traces of industry-standard commercial benchmarks, which examines the characteristic differences between technical and commercial workloads and illustrates how those differences affect cache performance. Commercial and technical environments differ in their respective branch behavior, operating system activity, I/O, and dispatching characteristics. A wide range of uniprocessor instruction and data cache geometries were studied. The instruction cache results for commercial workloads demonstrate that instruction cache performance can no longer be neglected because these workloads have much larger code working sets than technical applications. For database workloads, a breakdown of kernel and user behavior reveals that the application component can exhibit behavior similar to the operating system and therefore, can experience miss rates equally high. This paper also indicates that “dispatching” or process switching characteristics must be considered when designing level-two caches. The data presented shows that increasing the associativity of second-level caches can reduce miss rates significantly. Overall, the results of this research should help system designers choose a cache configuration that will perform well in commercial markets.
ieee international symposium on workload characterization | 2005
Bill Maron; Thomas W. Chen; Duc Vianney; Bret R. Olszewski; Steve R. Kunkel; Alex E. Mericas
Workload characterization has become an integral part of the design of future servers since their characteristics can guide the developers to understand the workload requirements and how the underlying architecture would optimize the performance of the intended workload. In this paper, we give an overview of the POWER5 architecture. We also introduce the POWER5 performance monitor facilities and performance events that lead to the construction of a CPI (cycles per instruction) breakdown model. For our study, we characterize four different groups of workloads: commercial, HPC, memory, and scientific. Using the data obtained from the POWER5 performance counters, we breakdown the CPI stack into a base component, when the processor is completing work and a stall component when the processor is not completing instructions. The stall component can be further divided into cycles when the pipeline was empty and cycles when the pipeline was not empty but completion is stalled. With this model, we enumerate the number of processing cycles, i.e., a fraction of the CPI, a workload spent while progressing through the core resources and the incurred penalty upon encountering those resource usage inhibitors. The results show the CPI breakdown for each workload, identify where each workload spends its processing cycles and the associated CPI cost when accessing the core resources.
international conference on parallel and distributed systems | 2004
Diana Villa; Jaime C. Acosta; Patricia J. Teller; Bret R. Olszewski; Trevor Morgan
Because of the increasing gap between processor frequency and dynamic random access memory (DRAM) speed, the performance of the memory subsystem typically governs that of the system as a whole. This is especially true for symmetric multiprocessor systems (SMPs). Therefore, performance evaluation methodologies that facilitate the analysis and optimization of the memory subsystem are essential. This paper, describes such a methodology, a performance evaluation framework, and demonstrates its power, speed, and flexibility in the context of a study of the TPC-C benchmark, executed on eight- and 32-processor IBM-pSeries 690 (p690) systems. The framework facilitates analysis of sampled performance monitor event traces that are collected in real time. The analyses are used to characterize the locality of reference exhibited by TPC-C data loads at the various levels of the memory hierarchy and evaluate the efficacy of design aspects of and policies associated with the p690 memory hierarchy w.r.t. workload demands.
ieee computer society international conference | 1995
Bret R. Olszewski; Jean-Jacques Guillemaud
The first PowerPC-based SMP jointly developed by IBM and Croup Bull, had aggressive performance goals for its intended market of commercial applications. This paper describes the hardware and software design processes used in the product development, as well as performance results obtained on the first-generation PowerPC 601-based hardware.
Archive | 2004
Dean Joseph Burdick; Bret R. Olszewski
Archive | 2008
Bret R. Olszewski; Randal C. Swanberg
Archive | 2005
William Joseph Armstrong; Timothy R. Marchini; Naresh Nayar; Bret R. Olszewski; Mysore Sathyanarayana Srinivas
Archive | 2008
Vaijayanthimala K. Anand; Peter J. Heyrman; Bret R. Olszewski
Archive | 2006
Vaijayanthimala K. Anand; Dean Joseph Burdick; Bret R. Olszewski
Archive | 1995
Frank Carl Gover; Frank Eliot Levine; Bret R. Olszewski; Charles Philip Roth; Edward Hugh Welbon; Charles P. Wright