Krishna V. Palem | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krishna V. Palem is active.

Explore More

Publication

Featured researches published by Krishna V. Palem.

IEEE Computer | 1997

Seeking solutions in configurable computing

William H. Mangione-Smith; Brad L. Hutchings; David L. Andrews; André DeHon; Carl Ebeling; Reiner W. Hartenstein; Oskar Mencer; John Morris; Krishna V. Palem; Viktor K. Prasanna; Henk A. E. Spaanenburg

Configurable computing offers the potential of producing powerful new computing systems. Will current research overcome the dearth of commercial applicability to make such systems a reality? Unfortunately, no system to date has yet proven attractive or competitive enough to establish a commercial presence. We believe that ample opportunity exists for work in a broad range of areas. In particular, the configurable computing community should focus on refining the emerging architectures, producing more effective software/hardware APIs, better tools for application development that incorporate the models of hardware reconfiguration, and effective benchmarking strategies.

ieee international conference on high performance computing data and analytics | 2004

Trimaran: an infrastructure for research in instruction-level parallelism

Lakshmi N. Chakrapani; John C. Gyllenhaal; Wen-mei W. Hwu; Scott A. Mahlke; Krishna V. Palem; Rodric M. Rabbah

Trimaran is an integrated compilation and performance monitoring infrastructure. The architecture space that Trimaran covers is characterized by HPL-PD, a parameterized processor architecture supporting novel features such as predication, control and data speculation and compiler controlled management of the memory hierarchy. Trimaran also consists of a full suite of analysis and optimization modules, as well as a graph-based intermediate language. Optimizations and analysis modules can be easily added, deleted or bypassed, thus facilitating compiler optimization research. Similarly, computer architecture research can be conducted by varying the HPL-PD machine via the machine description language HMDES. Trimaran also provides a detailed simulation environment and a flexible performance monitoring environment that automatically tracks the machine as it is varied.

compilers, architecture, and synthesis for embedded systems | 2006

Probabilistic arithmetic and energy efficient embedded signal processing

Jason George; Bo Marr; Bilge E. S. Akgul; Krishna V. Palem

Probabilistic arithmetic, where the ith output bit of addition and multiplication is correct with a probability pi , is shown to be a vehicle for realizing extremely energy-efficient, embedded computing. Specifically, probabilistic adders and multipliers, realized using elements such as gates that are in turn probabilistic, are shown to form a natural basis for primitives in the signal processing (DSP) domain. In this paper, we show that probabilistic arithmetic can be used to compute the FFT in an extremely energy-efficient manner, yielding energy savings of over 5. 6X in the context of the widely used synthetic aperture radar (SAR) application [1]. Our results are derived using novel probabilistic cmos (PC-MOS) technology, characterized and applied in the past to realize ultra-efficient architectures for probabilistic applications [2, 3, 4]. When applied to the dsp domain, the resulting error in the output of a probabilistic arithmetic primitive, such as an adder for example, manifests as degradation in the signal-to-noise ratio (SNR) ofthe sar image that is reconstructed through the FFT algorithm. In return for this degradation that is enabled by our probabilistic arithmetic primitives ?- degradation visually indistinguishable from an image reconstructed using conventional deterministic approaches -- significant energy savings and performance gains are shown to be possible per unit of SNR degradation. These savings stem from a novel method of voltage scaling, which we refer to as biased voltage scaling (or BIVOS), that is the major technical innovation on which our probabilistic designs are based.

design, automation, and test in europe | 2011

Energy parsimonious circuit design through probabilistic pruning

Avinash Lingamneni; Christian Enz; Jean-Luc Nagel; Krishna V. Palem; Christian Piguet

Inexact Circuits or circuits in which the accuracy of the output can be traded for energy or delay savings, have been receiving increasing attention of late due to invariable inaccuracies in designs as Moores law approaches the low nanometer range, and a concomitant growing desire for ultra low energy systems. In this paper, we present a novel design-level technique called probabilistic pruning to realize inexact circuits. Unlike the previous techniques in literature which relied mostly on some form of scaling of operational parameters such as the supply voltage (Vdd) to achieve energy and accuracy tradeoffs, our technique uses pruning of portions of circuits having a lower probability of being active, as the basis for performing architectural modifications resulting in significant savings in energy, delay and area. Our approach yields more savings when compared to any of the conventional voltage scaling schemes, for similar error values. Extensive simulations using this pruning technique in a novel logic synthesis based CAD framework on various architectures of 64-bit adders demonstrate that normalized gains as great as 2×–7.5× in the Energy-Delay-Area product can be obtained, with a relative error percentage as low as 10−6% up to 10%, when compared to corresponding conventionally correct designs.

symposium on the theory of computing | 1991

Combining tentative and definite executions for very fast dependable parallel computing

Zvi M. Kedem; Krishna V. Palem; Arvind Raghunathan; Paul G. Spirakis

We present a general and efficient strategy for computing mtustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly bc restored later. We first introduce the notions of dejinite and tentatitie algorithms for executing a single parallel step of an ideal parallel machine on the unreliable machine. A definite algorithm is one that guarantees a correct “This research was partially supported by the National Science Foundation under grant number CCR88-6949 and by the EEC ESPRIT Basic Research Actions Project ALCOM (No 3075). t Cwent ~dress: Ecole des Hautes Etudes en Informatique, Univemit4 Fterk Descartes, 45, rue des Saints-P&res, 76006 Paris, i?kIce. Permanent address: Department of Computer Science, New York University, 251 Mercer St., New York, NY 10012-1185, USA; +1 (212) 998-3101; [email protected]. This .suthor’s research was conducted while he was visiting the IBM Research Division at the T. J. Watson Research Center and the Institute for Advanced Computer Studies at the University of Maryland.

ACM Transactions in Embedded Computing Systems | 2003

Data remapping for design space optimization of embedded memory systems

Rodric M. Rabbah; Krishna V. Palem

~M ~~~ Divigion, T. J. Watson fi~ew~ Centw, p, 0. Box 704, Yorktown Heights, NY 10598, USA; +1 (914] 784-7846; kpalam~ibrn.corn. i Computff S&we Division, University of Csdifonnia, Davis, CA 95616, USA; +1 (916) 752-1287; raghunatWris.ucdavis.edu. Part of this author’s research was conducted while he was visiting New York Univemity. ?Computm TeclMology Institute, Patras University, P. O. Box 1122, 26110 Patras, Greece; +30 (61) 225-073; spirakis~ grpatvxl.bitnet. Tbis author’s research was conducted while he was visiting New York University. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commemal advantage, the ACM copyright notice and the title of the pubhcation and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 089791-397-3/91/0004/038 1

ACM Transactions on Design Automation of Electronic Systems | 2007

Probabilistic system-on-a-chip architectures

Lakshmi N. Chakrapani; Bilge E. S. Akgul; Krishna V. Palem

1.50 Raghunathan

ACM Transactions in Embedded Computing Systems | 2013

Ten Years of Building Broken Chips: The Physics and Engineering of Inexact Computing

Krishna V. Palem; Avinash Lingamneni

P. G. Spirakis~ execution of a step, while a tentative algorithm is one that is “highly likely” to produce a correct execution of a step on the unreliable machine. We show that any definite execution of one step requires Cl(log n) time on an *processor unreliable machine, even if all the processors functioned perfectly, This implies an

compilers, architecture, and synthesis for embedded systems | 2003

Energy aware algorithm design via probabilistic computing: from algorithms and models to Moore's law and novel (semiconductor) devices

Krishna V. Palem

l(log n) slowdown for executing any non-trivial program on the unreliable machine, provided only definite executions are used. We get around this overhead by combining tentative and definite execution schemes appropriately, to derive correct and efllcient robust executions for arbitrary PRAM programs, with expected amortized slowdown of only 0(1) for a variety of reasonable failure models. We adeve this by using a tentative algorithm to execute each of the program’s steps, while using a definite algorithm to audit the execution at selected points. If the audit does not certify the execution as correct, then the execution is rolled back to a previous audit point and restarted from there. In contrast to this work, all previous results required a slowdown of Cl(log n), since they used definite algorithms only.

symposium on the theory of computing | 1994

Non-standard stringology: algorithms and complexity

S. Muthukrishnan; Krishna V. Palem

In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing high-performance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access, and others. The proposed methodology will become increasingly important, especially as the needs for application specific embedded architectures become prevalent. In addition, we demonstrate the wide applicability of data remapping using several existing microprocessors, such as the Pentium and UltraSparc. Namely, we show that remapping can achieve a performance improvement of 20% on the average. Similarly, for a parametric research HPL-PD microprocessor, which characterizes the new Itanium machines, we achieve a performance improvement of 28% on average. All of our results are achieved using applications from the DIS, Olden and SPEC2000 suites of integer and floating point benchmarks.

Explore More