Paul Muench
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul Muench.
IEEE Transactions on Components, Packaging, and Manufacturing Technology: Part B | 1998
Wiren D. Becker; Jim Eckhardt; Roland Frech; George A. Katopis; Erich Klink; Michael F. McAllister; Timothy G. McNamara; Paul Muench; Stephen R. Richter; Howard H. Smith
Complementary metal-oxide-semiconductor (CMOS) microprocessors operating in the hundreds of megahertz create significant current deltas due to the variation in switching activity front clock cycle to clock cycle. In addition to the high-frequency voltage variations more commonly discussed, a lower frequency noise component is also produced that lasts from 50-200 ns which we refer to as mid-frequency noise. In this paper, we discuss the design of IBMs CMOS S/390 computer for control of mid-frequency noise. This machine has a 10-way multiprocessor on a 127 mm by 127 mm multichip module (MCM) on a FR4 board. The chips on the MCM cause a current step of tens of Amps in a few cycles that can be sustained for many cycles. The power distribution and decoupling capacitors must supply that current without disturbing the voltage level at the circuits. The design of the system power distribution and modeling and verification of mid-frequency noise in this system is presented.
international solid-state circuits conference | 2014
Zeynep Toprak-Deniz; Michael A. Sperling; John F. Bulzacchelli; Gregory Scott Still; Ryan Kruse; Seongwon Kim; David William Boerstler; Tilman Gloekler; Raphael Robertazzi; Kevin Stawiasz; Timothy Diemoz; George English; David T. Hui; Paul Muench; Joshua Friedrich
Integrated voltage regulator modules (iVRMs) [1] provide a cost-effective path to realizing per-core dynamic voltage and frequency scaling (DVFS), which can be used to optimize the performance of a power-constrained multi-core processor. This paper presents an iVRM system developed for the POWER8™ microprocessor, which functions as a very fast, accurate low-dropout regulator (LDO), with 90.5% peak power efficiency (only 3.1% worse than an ideal LDO). At low output voltages, efficiency is reduced but still sufficient to realize beneficial energy savings with DVFS. Each iVRM features a bypass mode so that some of the cores can be operated at maximum performance with no regulator loss. With the iVRM area including the input decoupling capacitance (DCAP) (but not the output DCAP inherent to the cores), the iVRMs achieve a power density of 34.5W/mm2, which exceeds that of inductor-based or SC converters by at least 3.4× [2].
IEEE Journal of Solid-state Circuits | 2015
Eric Fluhr; Steve Baumgartner; David William Boerstler; John F. Bulzacchelli; Timothy Diemoz; Daniel M. Dreps; George English; Joshua Friedrich; Anne E. Gattiker; Tilman Gloekler; Christopher J. Gonzalez; Jason D. Hibbeler; Keith A. Jenkins; Yong Kim; Paul Muench; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Phillip J. Restle; Raphael Robertazzi; David Shan; David W. Siljenberg; Michael A. Sperling; Kevin Stawiasz; Gregory Scott Still; Zeynep Toprak-Deniz; James D. Warnock; Glen A. Wiedemeier; Victor Zyuban
POWER8™ is a 12-core processor fabricated in IBMs 22 nm SOI technology with core and cache improvements driven by big data applications, providing 2.5× socket performance over POWER7+™. Core throughput is supported by 7.6 Tb/s of off-chip I/O bandwidth which is provided by three primary interfaces, including two new variants of Elastic Interface as well as embedded PCI Gen-3. Power efficiency is improved with several techniques. An on-chip controller based on an embedded PowerPC™ 405 processor applies per-core DVFS by adjusting DPLLs and fully integrated voltage regulators. Each voltage regulator is a highly distributed system of digitally controlled microregulators, which achieves a peak power efficiency of 90.5%. A wide frequency range resonant clock design is used in 13 clock meshes and demonstrates a minimum power savings of 4%. Power and delay efficiency is achieved through the use of pulsed-clock latches, which require statistical validation to ensure robust yield.
ieee conference on mass storage systems and technologies | 2010
Gong Zhang; Lawrence Chiu; Clem Dickey; Ling Liu; Paul Muench; Sangeetha Seshadri
The significant IO improvements of Solid State Disks (SSD) over traditional rotational hard disks makes it an attractive approach to integrate SSDs in tiered storage systems for performance enhancement. However, to integrate SSD into multi-tiered storage system effectively, automated data migration between SSD and HDD plays a critical role. In many real world application scenarios like banking and supermarket environments, workload and IO profile present interesting characteristics and also bear the constraint of workload deadline. How to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD-enabled multi-tiered storage system. In this paper, we present an automated, deadline-aware, lookahead migration scheme to address the data migration challenge. We analyze the factors that may impact on the performance of lookahead migration efficiency and develop a greedy algorithm to adaptively determine the optimal lookahead window size to optimize the effectiveness of lookahead migration, aiming at improving overall system performance and resource utilization while meeting workload deadlines. We compare our lookahead migration approach with the basic migration model and validate the effectiveness and efficiency of our adaptive lookahead migration approach through a trace driven experimental study.
international solid-state circuits conference | 2004
Joachim Gerhard Clabes; Joshua Friedrich; Mark Sweet; Jack DiLullo; Sam Gat-Shang Chu; Donald W. Plass; J. Dawson; Paul Muench; L. Powell; Michael Stephen Floyd; Balaram Sinharoy; Miranda Lee; Michael Normand Goulet; James Donald Wagoner; Nicole S. Schwartz; Stephen Larry Runyon; Gary E. Gorman; Phillip J. Restle; Ronald Nick Kalla; Joseph McGill; Steve Dodson
POWER5/sup TM/ is the next generation of IBMs POWER microprocessors. This design, sets a new standard of server performance by incorporating simultaneous multithreading (SMT), an enhanced distributed switch and memory subsystem supporting 164w SMP, and extensive RAS support. First pass hardware using IBMs 130nm silicon-on-insulator technology operates above 1.5GHz at 1.3V. POWER5s dual-threaded SMT creates up to two virtual processors per core, improving execution unit utilization and masking memory latency. Although a simplistic SMT implementation promised /spl sim/20% performance improvement, resizing critical microarchitectural resources almost doubles in many cases the SMT performance benefit at a 24% area. Implementing these microarchitectural enhancements posed challenges in meeting the chips frequency, area, power, and thermal targets.
symposium on vlsi circuits | 2010
Jose A. Tierno; Alexander V. Rylyakov; Daniel J. Friedman; Ann Chen; Anthony E. Ciesla; Timothy Diemoz; George English; David T. Hui; Keith A. Jenkins; Paul Muench; Gaurav Rao; George William Smith; Michael A. Sperling; Kevin Stawiasz
A per-core clock generator for the eight-core POWER7™ processor is implemented with a digital PLL. This frequency generator is capable of smooth, controlled frequency slewing, minimizing the impact of di/dt. Frequency can be dynamically adjusted while the clock is running, and without skipping any cycles, thus enabling aggressive power management techniques.
Ibm Journal of Research and Development | 2014
Sangeetha Seshadri; Paul Muench; Lawrence Chiu; Ioannis Koltsidas; Nikolas Ioannou; Robert Haas; Yang Liu; Mei Mei; Stephen L. Blinick
A software defined storage environment is one in which logical storage resources and services are completely abstracted from physical storage systems. Therefore, not only can storage resources cross physical boundaries, but they can also be defined by software and provisioned automatically, for instance, by the applications that consume them. In this paper, we present a novel software defined cooperative caching (SDCC) framework that operates at the block layer and manages the placement of data in different tiers and caches that span multiple servers and storage systems in an integrated and coherent fashion. A programming interface complements the core framework by giving the applications an interface to control data organization across the storage, thereby allowing the block storage infrastructure to be software defined. The SDCC framework allows applications to actively influence the data layout while also benefitting from the system-wide knowledge and resource management capabilities of the storage system. We present an experimental study conducted using real workloads, and the results demonstrate the performance benefits gained with SDCC, as well as the potential for consolidating multiple different workloads that share the same storage server.
ieee international conference on services computing | 2007
Sangeetha Seshadri; Ling Liu; Brian F. Cooper; Lawrence Chiu; Karan Gupta; Paul Muench
Today organizations and business enterprises of all sizes need to deal with unprecedented amounts of digital information, creating challenging demands for mass storage and on-demand storage services. The current trend of clustered scale-out storage systems use symmetric active replication based clustering middleware to provide continuous availability and high throughput. Such architectures provide significant gains in terms of cost, scalability and performance of mass storage and storage services. However, a fundamental limitation of such an architecture is its vulnerability to application-induced massive dependent failures of the clustering middleware. In this paper, we propose hierarchical middleware architectures that improve availability and reliability in scale-out storage systems while continuing to deliver the cost and performance advantages and a single system image (SSI). Hierarchical middleware architectures organize critical cluster management services into an overlay network that provides application fault isolation and eliminates symmetric clustering middleware as a single-point-of-failure. We present an in-depth evaluation of hierarchical middlewares based on an industry-strength storage system. Our results show that hierarchical architectures can significantly improve availability and reliability of scale-out storage clusters.
international solid-state circuits conference | 2017
Michael Stephen Floyd; Phillip J. Restle; Michael A. Sperling; Pawel Owczarczyk; Eric Fluhr; Joshua Friedrich; Paul Muench; Timothy Diemoz; Pierce Chuang; Christos Vezyrtzis
Increasing transistor counts in modern processors can create instantaneous changes in current, driving nanosecond-speed supply voltage (VDD) droops that require extra guardband for correct product operation. The POWER9 processor uses an adaptive clock strategy to reduce timing margin needed during power supply droop events by embedding analog voltage-droop monitors (VDMs) that direct a digital phase-locked loop (DPLL) to immediately reduce clock frequency in response.
international conference on distributed computing systems | 2017
Sangeetha Seshadri; Paul Muench; Lawrence Chiu
Data is the new natural resource of this century. As data volumes grow and applications aimed at monetizing the data continue to evolve, data processing platforms are expected to meet new scale, performance, reliability and data retention requirements. At the same time, storage hardware continues to improve in performance and price-performance. In this paper, we present TOKVS - Trillion Operation Key-Value Store, a NoSQL storage engine that redefines the storage software stack to meet the requirements of next-generation applications on next-generation hardware.