Luiz C. Alves
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Luiz C. Alves.
Ibm Journal of Research and Development | 1999
Michael Mueller; Luiz C. Alves; Wolfgang Fischer; Myron L. Fair; Indravadan N. Modi
The Reliability/Availability/Serviceability (RAS) strategy for S/390® G5 and G6 is to continue the S/390 objective of providing Continuous Reliable Operation (CRO). The RAS strategy is constructed with a set of building blocks which work closely together: error prevention, error detection, error recovery, problem determination, service structure, change management, and RAS measurement and analysis. The interdependency among the building blocks is such that removing or weakening any of them limits the ability of the design to achieve the overall CRO objective. Each building block must be fully implemented and must execute flawlessly within itself and together with the other blocks.
Ibm Journal of Research and Development | 2004
Myron L. Fair; Christopher R. Conklin; Scott Barnett Swaney; Patrick J. Meaney; William J. Clarke; Luiz C. Alves; Indravadan N. Modi; Fritz Freier; Wolfgang Fischer; Norman E. Weber
The IBM eServerTM zSeries® Model z990 offers customers significant new opportunity for server growth while preserving and enhancing server availability. The z990 provides vertical growth capability by introducing the concurrent addition of processor/memory books and horizontal growth in channels by the use of extended virtualization technology. In order to continue to support the zSeries legacy for high availability and continuous reliable operation, the z990 delivers significant new features for reliability, availability, and serviceability (RAS). This paper describes these new capabilities, in each case presenting the value of the feature, both in terms of enhancing the self-management capability of the server and its availability.
Ibm Journal of Research and Development | 2012
Patrick J. Meaney; Luis A. Lastras-Montano; Vesselina K. Papazova; Eldee Stephens; Judy S. Johnson; Luiz C. Alves; James A. O'Connor; William J. Clarke
The IBM zEnterprise® system introduced a new and innovative redundant array of independent memory (RAIM) subsystem design as a standard feature on all zEnterprise servers. It protects the server from single-channel errors such as sudden control, bus, buffer, and massive dynamic RAM (DRAM) failures, thus achieving the highest System z® memory availability. This system also introduced innovations such as DRAM and channel marking, as well as a novel dynamic cyclic redundancy code channel marking. This paper describes this RAIM subsystem and other reliability, availability, and serviceability features, including automatic channel error recovery; data and clock interface lane calibration, recovery, and repair; intermittent lane sparing; and specialty engines for maintenance, periodic calibration, power, and power-on controls.
Ibm Journal of Research and Development | 2008
Jude A. Rivers; Pradip Bose; Prabhakar Kudva; John-David Wellman; Pia N. Sanda; Ethan H. Cannon; Luiz C. Alves
This paper presents an overview of Phaser, a toolset and methodology for modeling the effects of soft errors on the architectural and microarchitectural functionality of a system. The Phaser framework is used to understand the system-level effects of soft-error rates of a microprocessor chip as its design evolves through the phases of preconcept, concept, high-level design, and register-transfer-level design implementation. Phaser represents a strategic research vision that is being proposed as a next-generation toolset for predicting chip-level failure rates and studying reliability-performance tradeoffs during the phased design process. This paper primarily presents Phaser/M1, the early stage of the predictive modeling of behavior.
Ibm Journal of Research and Development | 2002
Luiz C. Alves; Myron L. Fair; Patrick J. Meaney; Chin-Long Chen; William J. Clarke; George C. Wellwood; Norman E. Weber; Indravadan N. Modi; Brian K. Tolan; Fritz Freier
The IBM eServer zSeries™ Model 900, or z900, has been designed with major enhancements for hardware reliability, availability, and serviceability (RAS) in support of the zSeries RAS strategy, the eServer self-management technologies, and the z900 design objective of continuous reliable operation. The eServer self-management technologies enable the server to protect itself, to detect and recover from errors, to change and configure itself, and to optimize itself, in the presence of problems and changes, for maximum performance with minimum outside intervention. From the RAS perspective, the longstanding RAS strategy for the IBM S/390® and now the zSeries has provided an excellent foundation for self management. This paper describes the z900 RAS enhancements and how they strengthen the RAS strategy building blocks and provide a basis for autonomic computing.
information theory and applications | 2011
Luis A. Lastras-Montano; Patrick J. Meaney; Eldee Stephens; Barry M. Trager; James A. O'Connor; Luiz C. Alves
In this article we describe a class of error control codes called “diff-MDS” codes that are custom designed for highly resilient computer memory storage. The error scenarios of concern range from simple single bit errors, to memory chip failures and catastrophic memory module failures. Our approach to building codes for this setting relies on the concept of expurgating a parity code that is easy to decode for memory module failures so that a few additional small errors can be handled as well, thus preserving most of the decoding complexity advantages of the original code while extending its original intent. The manner in which we expurgate is carefully crafted so that the strength of the resulting code is comparable to that of a Reed-Solomon code when used for this particular setting. An instance of this class of algorithms has been incorporated in IBMs zEnterprise mainframe offering, setting a new industry standard for memory resiliency.
Ibm Journal of Research and Development | 2009
William J. Clarke; Luiz C. Alves; Timothy J. Dell; Herwig Elfering; Jeffrey P. Kubala; Chung-Ching Lin; Michael Mueller; Klaus Werner
The IBM System z10™ server reliability, availability, and serviceability (RAS) design continues to reduce the sources of server outages through innovative RAS architecture and techniques. The z10™ server introduced functional improvements that challenged the RAS design. Increases were made in the performance of each processor, the total number of processors, the total size of the memory, the amount of cache, the bandwidth of the I/O, the thermal density, and the exposure to soft errors. These changes demanded stronger RAS functions to prevent unscheduled outages. Significant improvements were made to the IBM e-business on demand® functions (concurrent, customer-requested upgrades) that enable customers to better manage capacity without having to take planned outages. The hypervisor simplified configuration changes, such as adding cryptography or channel subsystems to logical partitions, by eliminating the need for preplanning. Single-core checkstopping and single transparent CPU (central processing unit) sparing were added. The RAS functions reduced the number of scheduled outages. Product improvements were complemented by improvements in RAS modeling. This paper describes these RAS improvements and how they provide value to the customer.
Ibm Journal of Research and Development | 2015
William J. Clarke; Luiz C. Alves; Kevin W. Kark; R. K. Overton; M. J. Snihur
Each new system family brings changes from its predecessor as technologies shrink, as system structure evolves, and as requirements grow. A remap of an exceptional reliability, availability, serviceability (RAS) design from generation to generation would soon lead RAS characteristics to be inadequate. In this paper, we provide an introduction to some of changes in the IBM z13® and discuss the RAS improvements associated with these changes. We discuss the processor drawer, the memory and cache hierarchy, the I/O firmware stack, the power and thermal systems, the power and service control network, and enhancements to sparing.
Archive | 2007
Luis A. Lastras-Montano; James A. O'Connor; Luiz C. Alves; William J. Clarke; Timothy J. Dell; Thomas J. Dewkett; Kevin C. Gower
Archive | 2010
Luiz C. Alves; Kevin C. Gower; Luis A. Lastras-Montano; Patrick J. Meaney; Eldee Stephens