Ram K. Krishnamurthy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ram K. Krishnamurthy is active.

Explore More

Publication

Featured researches published by Ram K. Krishnamurthy.

IEEE Journal of Solid-state Circuits | 2002

A sub-130-nm conditional keeper technique

Atila Alvandpour; Ram K. Krishnamurthy; Krishnamurthy Soumyanath; Shekhar Borkar

Increasing leakage currents combined with reduced noise margins significantly degrade the robustness of wide dynamic circuits. In this paper, we describe two conditional keeper topologies for improving the robustness of sub-130-nm wide dynamic circuits. They are applicable in normal mode of operation as well as during burn-in test. A large fraction of the keepers is activated conditionally, allowing the use of strong keepers with leaky precharged circuits without significant impact on performance of the circuits. Compared to conventional techniques, up to 28% higher performance has been observed for wide dynamic gates in a 130-nm technology. In addition, the proposed burn-in keeper results in 64% active area reduction.

design automation conference | 2012

Near-threshold voltage (NTV) design: opportunities and challenges

Himanshu Kaul; Mark A. Anders; Steven K. Hsu; Amit Agarwal; Ram K. Krishnamurthy; Shekhar Borkar

Moores Law will continue providing abundance of transistors for integration, only to be limited by the energy consumption. Near threshold voltage (NTV) operation has potential to improve energy efficiency by an order of magnitude. We discuss design techniques necessary for reliable operation over a wide range of supply voltage - from nominal down to subthreshold region. The system designed for NTV can dynamically select modes of operation, from high performance, to high energy efficiency, to the lowest power.

Archive | 2006

Reconfigurable Computing: Architectures, Tools and Applications

Andreas Koch; Ram K. Krishnamurthy; John McAllister; Roger F. Woods; Tarek A. El-Ghazawi

Clustering of a large number of data points is a computational demanding task that often needs the be accelerated in order to be useful in practice. The focus of this work is on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which is one of the state-of-the-art clustering algorithms, targeting its acceleration using an FPGA device. The paper presents a novel, optimised and scalable architecture that takes advantage of the internal memory structure of modern FPGAs in order to deliver a high performance clustering system. Results show that the developed system can obtain average speed-ups of 32x in real-world tests and 202x in synthetic tests when compared to state-of-the-art software counterparts.

symposium on computer arithmetic | 2005

An improved unified scalable radix-2 Montgomery multiplier

David Money Harris; Ram K. Krishnamurthy; Mark A. Anders; Sanu K. Mathew; Steven K. Hsu

This paper describes an improved version of the Tenca-Koc unified scalable radix-2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the Tenca-Koc multiplier, this design is reconfigurable to accept any input precision in either GF(p) or GF(2/sup n/) up to the size of the on-chip memory. An FPGA implementation can perform 1024-bit modular exponentiation in 16 ms using 5598 4-input lookup tables, making it the fastest unified scalable design yet reported.

IEEE Journal of Solid-state Circuits | 2003

A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core

Sanu K. Mathew; Mark A. Anders; Ram K. Krishnamurthy; Shekhar Borkar

This paper describes a 32-bit Address Generation Unit (AGU) designed for 4 GHz operation in 1.2 V, 130 nm technology. The AGU utilizes a 152 ps dual-V, sparse-tree adder core to achieve 20% delay reduction, 80% lower interconnect density and a low (1%) active energy leakage component. The semidynamic implementation enables an average energy profile similar to static CMOS, with good sub-130 nm scaling trend.

international solid-state circuits conference | 2008

A 320mV 56μW 411GOPS/Watt Ultra-Low Voltage Motion Estimation Accelerator in 65nm CMOS

Himanshu Kaul; Mark A. Anders; Sanu K. Mathew; Steven K. Hsu; Amit Agarwal; Ram K. Krishnamurthy; Shekhar Borkar

Motion estimation for compressing inter-frame redundancies is the most performance and power-critical operation in video encoding applications, where a wide range of throughput and power constraints are required to handle a variety of video resolution, frame rate and application specifications. A motion estimation engine targeted for special-purpose on-die acceleration of sum of absolute difference (SAD) computation in real-time video encoding workloads on power-constrained mobile microprocessors is fabricated in 65nm CMOS.

international solid-state circuits conference | 2004

A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS

Sanu K. Mathew; Mark A. Anders; Brad Bloechel; Trang Nguyen; Ram K. Krishnamurthy; Shekhar Borkar

This paper describes a single-cycle 64-bit integer execution ALU fabricated in 90-nm dual-Vt CMOS technology, operating at 4 GHz in the 64-bit mode with a 32-bit mode frequency of 7 GHz (measured at 1.3 V, 25/spl deg/ C). The lower- and upper-order 32-bit domains operate on separate off-chip supply voltages, enabling conditional turn-on/off of the 64-bit ALU mode operation and efficient power-performance optimization. High-speed single-rail dynamic circuit techniques and a sparse-tree semi-dynamic adder architecture enable a dense layout occupying 280 /spl times/ 260 /spl mu/m/sup 2/ while simultaneously achieving: (i) low carry-merge fan-outs and inter-stage wiring complexity; (ii) low active leakage and dynamic power consumption; (iii) high DC noise robustness with maximum low-Vt usage; (iv) single-rail dynamic-compatible ALU write-back bus; (v) simple 2/spl Phi/ 50% duty-cycle timing plan with seamless time-borrowing across phases; (vi) scalable 64-bit ALU performance up to 7 GHz measured at 2.1 V, 25/spl deg/ C; and (vii) scalable 32-bit ALU performance up to 9 GHz measured at 1.68 V, 25/spl deg/ C.

IEEE Transactions on Very Large Scale Integration Systems | 2006

A process variation compensating technique with an on-die leakage current sensor for nanometer scale dynamic circuits

Chris H. Kim; Kaushik Roy; Steven K. Hsu; Ram K. Krishnamurthy; Shekhar Borkar

This paper describes a process compensating dynamic (PCD) circuit technique for maintaining the performance benefit of dynamic circuits and reducing the variation in delay and robustness. A variable strength keeper that is optimally programmed based on the die leakage, enables 10% faster performance, 35% reduction in delay variation, and 5times reduction in the number of robustness failing dies, compared to conventional designs. A new leakage current sensor design is also presented that can detect leakage variation and generate the keeper control signals for the PCD technique. Results based on measured leakage data show 1.9-10.2times higher signal-to-noise ratio (SNR) and reduced sensitivity to supply and p-n skew variations compared to prior leakage sensor designs

IEEE Journal of Solid-state Circuits | 2002

A 130-nm 6-GHz 256 /spl times/ 32 bit leakage-tolerant register file

Ram K. Krishnamurthy; Atila Alvandpour; Ganesh Balamurugan; Naresh R. Shanbhag; Krishnamurthy Soumyanath; Shekhar Borkar

Describes a 256-word /spl times/ 32-bit 4-read, 4-write ported register file for 6-GHz operation in 1.2-V 130-nm technology. The local bitline uses a pseudostatic technique for aggressive bitline active leakage reduction/tolerance to enable 16 bitcells/bitline, low-V/sub t/ usage, and 50% keeper downsizing. Gate-source underdrive of -V/sub cc/ on read-select transistors is established without additional supply/bias voltages or gate-oxide overstress. 8% faster read performance and 36% higher dc noise robustness is achieved compared to dual-V/sub t/ bitline scheme optimized for high performance. Device-level measurements in the 130-nm technology show 703/spl times/ bitline active leakage reduction, enabling continued V/sub t/ scaling and robust bitline scalability beyond 130-nm generation. Sustained performance and robustness benefit of the pseudostatic technique against conventional dynamic bitline with keeper-upsizing is also presented.

international solid-state circuits conference | 2014

16.2 A 0.19pJ/b PVT-variation-tolerant hybrid physically unclonable function circuit for 100% stable secure key generation in 22nm CMOS

Sanu K. Mathew; Sudhir K. Satpathy; Mark A. Anders; Himanshu Kaul; Steven K. Hsu; Amit Agarwal; Gregory K. Chen; Rachael J. Parker; Ram K. Krishnamurthy; Vivek De

Physically unclonable function (PUF) circuits are low-cost cryptographic primitives used for generation of unique, stable and secure keys or chip IDs for device authentication and data security in high-performance microprocessors [1][2][3][7]. The volatile nature of PUFs provides a high level of security and tamper resistance against invasive probing attacks compared to conventional fuse-based key storage technologies [4]. A process-voltage-temperature (PVT) variation-tolerant all-digital PUF array targeted for on-die generation of 100% stable, device-specific, high-entropy keys is fabricated in 22nm tri-gate high-κ metal-gate CMOS technology [5], featuring: i) a hybrid delay/cross-coupled PUF circuit where interaction of 16 minimum-sized, variation-impacted transistors determines resolution dynamics, ii) a temporal majority voting (TMV) circuit to stabilize occasionally unstable bits, resulting in 53% reduction in instability, iii) burn-in hardening to reinforce manufacturing-time PUF bias, resulting in 22% reduction in bit-errors, iv) soft dark bits for run-time identification and sequestration of highly unstable bits during field operation, resulting in 78% lower bit-errors, v) 19× separation between inter- and intra-PUF Hamming distance, enabling die-specific keys, vi) autocorrelation factor≈0 and entropy=0.9997, while passing NIST randomness tests, vii) high tolerance to voltage and temperature variation with 82% reduction in average Hamming-distance using a 100-cycle dark bit window, viii) in-situ PUF hardening by leveraging directed NBTI aging to improve stability during field operation, and ix) ultra-low energy consumption of 0.19pJ/b with compact bitcell layout of 4.66μm2 (Fig. 16.2.7a).

Explore More