Marios C. Papaefthymiou

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marios C. Papaefthymiou is active.

Explore More

Publication

Featured researches published by Marios C. Papaefthymiou.

high performance computer architecture | 2012

Computational sprinting

Arun Raghavan; Yixin Luo; Anuj Chandawalla; Marios C. Papaefthymiou; Kevin P. Pipe; Thomas F. Wenisch; Milo M. K. Martin

Although transistor density continues to increase, voltage scaling has stalled and thus power density is increasing each technology generation. Particularly in mobile devices, which have limited cooling options, these trends lead to a utilization wall in which sustained chip performance is limited primarily by power rather than area. However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity. To improve responsiveness for such applications, this paper explores activating otherwise powered-down cores for sub-second bursts of intense parallel computation. The approach exploits the concept of computational sprinting, in which a chip temporarily exceeds its sustainable thermal power budget to provide instantaneous throughput, after which the chip must return to nominal operation to cool down. To demonstrate the feasibility of this approach, we analyze the thermal and electrical characteristics of a smart-phone-like system that nominally operates a single core (~1W peak), but can sprint with up to 16 cores for hundreds of milliseconds. We describe a thermal design that incorporates phase-change materials to provide thermal capacitance to enable such sprints. We analyze image recognition kernels to show that parallel sprinting has the potential to achieve the task response time of a 16W chip within the thermal constraints of a 1W mobile platform.

international conference on computer aided design | 1994

Precomputation-based sequential logic optimization for low power

Mazhar Alidina; José C. Monteiro; Srinivas Devadas; Abhijit Ghosh; Marios C. Papaefthymiou

We address the problem of optimizing logic-level sequential circuits for low power. We present a powerful sequential logic optimization method that is based on selectively precomputing the output logic values of the circuit one clock cycle before they are required, and using the precomputed values to reduce internal switching activity in the succeeding clock cycle. We present two different precomputation architectures which exploit this observation. We present an automatic method of synthesizing precomputational logic so as to achieve maximal reductions in power dissipation. We present experimental results on various sequential circuits. Up to 75% reductions in average switching activity and power dissipation are possible with marginal increases in circuit area and delay.

Journal of the ACM | 1997

Optimizing two-phase, level-clocked circuitry

Alexander T. Ishii; Charles E. Leiserson; Marios C. Papaefthymiou

We investigate two strategies for reducing the clock period of a two-phase, level-clocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edge-triggered latches into a faster level-clocked one. We model a two-phase circuit as a graph G 5 (V, E) whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through zero or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomial-time algorithms for problems involving the timing verification and optimization of two-phase circuitry. Included are algorithms for —verifying proper timing: O(VE) time. —minimizing the clock period by clock tuning: O(VE) time. —retiming to achieve a given clock period when the phases are symmetric: O(VE 1 V lg V) time. —retiming to achieve a given clock period when either the duty cycle (high time) of one phase or the ratio of the phases’ duty cycles is fixed: O(V) time. We give fully polynomial-time approximation schemes for clock period minimization, within any given relative error e . 0, by —retiming and tuning when the duty cycles of the two phases are required to be equal: O((VE 1 V lg V)lg(V/e)) time. —retiming and tuning when either the duty cycle of one phase is fixed or the ratio of the phases’ duty cycles is fixed: O(V lg(V/e)) time. —simultaneous retiming and clock tuning with no conditions on the duty cycles of the two phases: O(V(1/e)lg(1/e) 1 (VE 1 V lg V)lg(V/e)) time. The first two of these approximation algorithms can be used to obtain the optimum clock period in the special case where all propagation delays are integers. We generalize most of the results for two-phase clocking schemes to simple multiphase clocking disciplines, including ones with overlapping phases. Typically, the algorithms to verify and optimize This research was supported in part by the Defense Advanced Research Projects Agency under Grant N00014-91-J-1698. Authors’ present addresses: A. T. Ishii, NEC USA C&C Research Laboratories, Princeton, NJ 08540; C. E. Leiserson, Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, MA 02139; M. C. Papaefthymiou, Advanced Computer Architecture Laboratory, Room 2218 EECS Building, Ann Arbor, MI 48109-2122. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery (ACM), Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and /or a fee. q 1997 ACM 0004-5411/97/0100-0148

acm symposium on parallel algorithms and architectures | 1991

Understanding retiming through maximum average-weight cycles

Marios C. Papaefthymiou

03.50 Journal of the ACM, Vol. 44, No. 1, January 1997, pp. 148–199. the timing of k-phase circuitry are at most a factor of k slower than the corresponding algorithms for two-phase circuitry. Our algorithms have been implemented in TIM, a timing package for two-phase, level-clocked circuitry developed at MIT.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2007

A Multi-Mode Power Gating Structure for Low-Voltage Deep-Submicron CMOS ICs

Suhwan Kim; Stephen V. Kosonocky; Daniel R. Knebel; Kevin Stawiasz; Marios C. Papaefthymiou

A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a well-known technique that transforms a given circuit into a faster circuit by relocating its registers. We give tight bounds on the minimum clock period that can be achieved by retiming a synchronous circuit. These bounds are expressed in terms of the maximum delay-to-register ratio of the cycles in the circuit graph and the maximum propagation delaydmax of the circuit components. Our bounds do not depend on the size of the circuit, and they are of theoretical as well as practical interest. They characterize exactly the minimum clock period that can be achieved by retiming a unit-delay circuit, and they lead to more efficient algorithms for several important problems related to retiming. Specifically, we give anO(V1/2E IgV) algorithm for minimum clock-period retiming of unit-delay circuitry. For non-unit-delay circuitry, we describe anO(VE Igdmax) algorithm for minimum clock-period retiming. We also describe anO(V1/2E lg2(Vdmax) algorithm for retiming with clock period that does not exceed the minimum by more thandmax — 1. Finally, we give anO(E Igdmax) algorithm for minimum clock-period pipelining of combinational circuitry.

international solid state circuits conference | 2007

Energy-Efficient GHz-Class Charge-Recovery Logic

Visvesh S. Sathe; Juang Ying Chueh; Marios C. Papaefthymiou

Most existing power gating structures provide only one power-saving mode. We propose a novel power gating structure that supports both a cutoff mode and an intermediate power-saving and data-retaining mode. Experiments with test structures fabricated in 0.13-mum CMOS bulk technology show that our power gating structure yields an expanded design space with more power-performance tradeoff alternatives.

IEEE Transactions on Computers | 2005

Charge-recovery computing on silicon

Suhwan Kim; Conrad H. Ziesler; Marios C. Papaefthymiou

In this paper, we present Boost Logic, a charge- recovery circuit family that can operate efficiently at clock frequencies in excess of 1 GHz. To achieve high energy efficiency, Boost Logic relies on a combination of aggressive voltage scaling, gate overdrive, and charge-recovery techniques. In post-layout simulations of 16-bit multipliers with a 0.13-mum CMOS process at 1GHz, a Boost Logic implementation achieves 5 times higher energy efficiency than its minimum-energy pipelined, voltage-scaled, static CMOS counterpart at the expense of 3 times longer latency. In a fully integrated test chip implemented using a 0.13-mum bulk silicon process and on-chip inductors, chains of Boost Logic gates operate at clock frequencies up to 1.3 GHz with a 1.5-V supply. When resonating at 850 MHz with a 1.2-V supply, the Boost Logic test chip achieves 60% charge-recovery

international symposium on low power electronics and design | 1998

True single-phase energy-recovering logic for low-power, high-speed VLSI

Suhwan Kim; Marios C. Papaefthymiou

Three decades ago, theoretical physicists suggested that the controlled recovery of charges could result in electronic circuitry whose power dissipation approaches thermodynamic limits, growing at a significantly slower pace than the fCV/sup 2/ rate for CMOS switching power. Early engineering research in this field, which became generally known as adiabatic computing, focused on the asymptotic energetics of computation, exploring VLSI designs that use reversible logic and adiabatic switching to preserve information and achieve nearly zero power dissipation as operating frequencies approach zero. Recent advances in CMOS VLSI design have taken us to real working chips that rely on controlled charge recovery to operate at substantially lower power dissipation levels than their conventional counterparts. Although their origins can be traced back to the early adiabatic circuits, these charge-recovering systems approach energy recycling from a more practical angle, shedding reversibility to achieve operating frequencies in the hundreds of MHz with relatively low overhead. Among other charge-recovery designs, researchers have demonstrated microcontrollers, standard-cell ASICs, SRAMs, LCD panel drivers, I/O drivers, and multiGHz clock networks. In this paper, we present an overview of the field and focus on two chip designs that highlight some of the promising charge recovering techniques in practice.

international conference on asic | 1999

Low power parallel multiplier design for DSP applications through coefficient optimization

Sangjin Hong; Suhwan Kim; Marios C. Papaefthymiou; Wayne E. Stark

In dynamic logic families that rely on energy recovery to achieve low energy dissipation, the flow of data through cascaded gates is controlled using multi-phase clocks. Consequently, these families require multiple clock generators and can exhibit increased energy consumption on their clock distribution networks. Moreover, they are not attractive for high-speed design due to clock skew management problems. In this paper, we present TSEL, the first energy-recovering logic family that operates with a single-phase clocking scheme. TSEL outperforms previous energy-recovering logic families in terms of energy efficiency and operating speed. In HSPICE simulations with a standard 0.5 /spl mu/m technology from MOSIS, pipelined carry-lookahead adders in TSEL function correctly for operating frequencies exceeding 280 MHz. For operating frequencies above 80 MHz, they dissipate considerably less energy per operation than alternative implementations of the same adder architecture in other energy-recovering logic families. In comparison with their CMOS counterparts, the TSEL adders dissipate about half as much energy at 280 MHz. Our results indicate that TSEL is an excellent candidate for high-speed and low power VLSI system design.

international symposium on low power electronics and design | 2001

A resonant clock generator for single-phase adiabatic systems

C.H. Ziesier; Suhwan Kim; Marios C. Papaefthymiou

Digital Signal Processing (DSP) often involves multiplications with a set of coefficients. This paper presents a novel multiplier design methodology for performing these coefficient multiplications with very low power dissipation. Given bounds on the throughput and the quantization error, our approach scales the original coefficients to enable the partitioning of each multiplication into a collection of smaller multiplications with shorter critical paths. Significant energy savings are achieved by performing these multiplications in parallel with a scaled supply voltage. Dissipation is further reduced by disabling the multiplier rows that do not affect the multiplications outcome. We have used our methodology to design a low-power parallel multiplier for the Fast Fourier Transform. Simulation results show that our approach can result in significant power savings over conventional multipliers.

Explore More