Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alex E. Mericas is active.

Publication


Featured researches published by Alex E. Mericas.


Ibm Journal of Research and Development | 2005

Characterization of simultaneous multithreading (SMT) efficiency in POWER5

Harry M. Mathis; Alex E. Mericas; John D. McCalpin; Richard J. Eickemeyer; Steven R. Kunkel

Coarse-grained multithreading, the switching of threads to avoid idle processor time during long-latency events, has been available on IBM systems since 1998. Simultaneous multithreading (SMT), first available on the POWER5TM processor, moves beyond simple thread switching to the maintenance of two thread streams that are issued as continuously as possible to ensure the maximum use of processor resources. Because SMT has the potential of increasing processor officiency and correspondingly increasing the amount of work done for a given time span, the reader might suppose that SMT would exhibit a performance gain for all workloads. This is true for most workloads, but is not true in some exceptional cases. In SMT mode, the processor resources--register sets, caches, queues, translation buffers, and the system memory nest--must be shared by both threads, and conditions can occur that degrade or even obviate SMT performance improvement. The POWER4TM and POWER5 processors have very powerful performance monitor (PM) toolsets that can help the user to determine what is occurring in workloads that may not be providing expected SMT gains. In this paper, the results of measured differences among workloads having large, medium, small, and even negative SMT performance gains are presented along with an approach to investigating workloads to determine the source of SMT performance gain limits.


IEEE Computer | 2003

Benchmarking Internet servers on superscalar machines

Yue Luo; Juan Rubio; Lizy Kurian John; Pattabi Seshadri; Alex E. Mericas

The authors compared three popular Internet server benchmarks with a suite of CPU-intensive benchmarks to evaluate the impact of front-end and middle-tier servers on modern microprocessor architectures.


Ibm Journal of Research and Development | 2011

IBM POWER7 performance modeling, verification, and evaluation

M. Srinivas; Balaram Sinharoy; Richard J. Eickemeyer; Ram Raghavan; Steven R. Kunkel; Tien Chi Chen; W. Maron; D. Flemming; A. Blanchard; P. Seshadri; Jeffrey W. Kellington; Alex E. Mericas; A. E. Petruski; V. R. Indukuru; S. Reyes

In this paper, we describe the key performance enhancements in IBM POWER7® microarchitecture and its memory hierarchy, including performance modeling and verification methodology. We also describe the performance characteristics of server applications, including Standard Performance Evaluation Corporation (SPEC) central processing unit, SAP Sales and Distribution, SPECjbb, online transaction processing workloads, and high-performance computing applications running on POWER7 processor-based systems compared with other systems.


ieee international symposium on workload characterization | 2005

Workload characterization for the design of future servers

Bill Maron; Thomas W. Chen; Duc Vianney; Bret R. Olszewski; Steve R. Kunkel; Alex E. Mericas

Workload characterization has become an integral part of the design of future servers since their characteristics can guide the developers to understand the workload requirements and how the underlying architecture would optimize the performance of the intended workload. In this paper, we give an overview of the POWER5 architecture. We also introduce the POWER5 performance monitor facilities and performance events that lead to the construction of a CPI (cycles per instruction) breakdown model. For our study, we characterize four different groups of workloads: commercial, HPC, memory, and scientific. Using the data obtained from the POWER5 performance counters, we breakdown the CPI stack into a base component, when the processor is completing work and a stall component when the processor is not completing instructions. The stall component can be further divided into cycles when the pipeline was empty and cycles when the pipeline was not empty but completion is stalled. With this model, we enumerate the number of processing cycles, i.e., a fraction of the CPI, a workload spent while progressing through the core resources and the incurred penalty upon encountering those resource usage inhibitors. The results show the CPI breakdown for each workload, identify where each workload spends its processing cycles and the associated CPI cost when accessing the core resources.


ieee hot chips symposium | 2014

Performance characteristics of the POWER8 processor

Alex E. Mericas

This article consists of a collection of slides from the authors conference presentation on the special features, system design and architectures, processing capabilities, and targeted markets for IBMs POWER8 family of processor products.


ieee hot chips symposium | 2008

Power-performance comparative evaluation of alternate microarchitectures

Rick Eickemeyer; Michael Stephen Floyd; John Barry Griswell; Alex E. Mericas; Balaram Sinharoy; Pradip Bose; Soraya Ghiasi; Hendrik F. Hamann; Hans M. Jacobson; Tom W. Keller; Victor Zyuban

This article consists of a collection of slides from the authors conference presentation. Some of the specific areas/topics discussed include: Power Dissipation and Efficiency Basics; POWER4 vs. POWERS; POWERS vs. POWER6; Roadrunner and Blue Gene System Efficiency; Conclusion; BACKUP: Looking Ahead: A Few Key Research Issues.


Archive | 2010

System and method for execution based filtering of instructions of a processor to manage dynamic code optimization

Venkat R. Indukuru; Alex E. Mericas; Brian R. Mestan; Ii Park


Archive | 2000

Method system and apparatus for instruction execution tracing with out of order processors

Jason N. Dale; Jim Kahle; D. Logan; Alex E. Mericas; William J. Starke; Philip L. Vitale


Archive | 2006

Method, apparatus, and computer program product in a performance monitor for sampling all performance events generated by a processor

Alex E. Mericas


Archive | 2010

Ineffective prefetch determination and latency optimization

Miles R. Dooley; Venkat R. Indukuru; Alex E. Mericas; Francis Patrick O'Connell

Collaboration


Dive into the Alex E. Mericas's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lizy Kurian John

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar

Pattabi Seshadri

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aaron Sawdey

University of Rochester

View shared research outputs
Top Co-Authors

Avatar

Juan Rubio

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar

Yue Luo

University of Texas at Austin

View shared research outputs
Researchain Logo
Decentralizing Knowledge