Kanad Ghose | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kanad Ghose is active.

Explore More

Publication

Featured researches published by Kanad Ghose.

international symposium on low power electronics and design | 1997

Analytical energy dissipation models for low-power caches

Milind B. Kamble; Kanad Ghose

We present detailed analytical models for estimating the energy dissipation in conventional caches as well as low energy cache architectures. The analytical models use the run time statistics such as hit/miss counts, fraction of read/write requests and assume stochastical distributions for signal values. These models are validated by comparing the power estimated using these models against the power estimated using a detailed simulator called CAPE (CAache Power Estimator). The analytical models for conventional caches are found to be accurate to within 2% error. However, these analytical models over-predict the dissipations of low-power caches by as much as 30%. The inaccuracies can be attributed to correlated signal values and locality of reference, both of which are exploited in making some cache organizations energy efficient.

international symposium on low power electronics and design | 1999

Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

Kanad Ghose; Milind B. Kamble

Modern microprocessors employ one or two levels of on-chip caches to bridge the burgeoning speed disparities between the processor and the RAM. These SRAM caches are a major source of power dissipation. We investigate architectural techniques, that do not compromise the processor cycle time, for reducing the power dissipation within the on-chip cache hierarchy in superscalar microprocessors. We use a detailed register-level simulator of a superscalar microprocessor that simulates the execution of the SPEC benchmarks and SPICE measurements for the actual layout of a 0.5 micron, 4-metal layer cache, optimized for a 300 MHz, clock. We show that a combination of subbanking, multiple line buffers and bit-line segmentation can reduce the on-chip cache power dissipation by as much as 75% in a technology-independent manner.

IEEE Transactions on Parallel and Distributed Systems | 1995

Hierarchical cubic networks

Kanad Ghose; Kiran Raghavendra Desai

We introduce a new interconnection network for large-scale distributed memory multiprocessors called the hierarchical cubic network (HCN). We establish that the number of routing steps needed by several data parallel applications running on a HCN-based system and a hypercube-based system are about identical. Further, hypercube connections can be emulated on the HCN in constant time. Simulation of uniform and localized traffic patterns reveal that the normalized average internode distances in a HCN are better than in a comparable hypercube. Additionally, the HCN also has about three-fourths the diameter of a comparable hypercube, although it uses about half as many links per node-a fact that has positive ramifications on the implementation of HCN-connected systems. >

international conference on vlsi design | 1997

Energy-efficiency of VLSI caches: a comparative study

Milind B. Kamble; Kanad Ghose

We investigate the use of organizational alternatives that lead to more energy-efficient caches for contemporary microprocessors. Dissipative transitions are likely to be highly correlated and skewed in caches, precluding the use of simplistic hit/miss ratio based power dissipation models for accurate power estimations. We use a detailed register-level simulator for a typical pipelined CPU and its multi-level caches, and simulate the execution of the SPECint92 benchmarks to glean accurate transition counts. A detailed dissipation model for CMOS caches is introduced for estimating the energy dissipation based on electrical parameters of a typical circuit implementation and the transition counts collected by simulation. A block buffering scheme is presented to allow cache energy requirements to be reduced without increasing access latencies. We report results for a system with an off-chip L2 cache. We conclude that block buffering, with sub-banking to be very effective in reducing energy dissipation in the caches, and in the off-chip I/O pad drivers.

international conference on computer design | 2004

Increasing processor performance through early register release

Oguz Ergin; Deniz Balkan; Dmitry Ponomarev; Kanad Ghose

Modern superscalar microprocessors need sizable register files to support large number of in-flight instructions for exploiting ILP. An alternative to building large register files is to use smaller number of registers, but manage them more effectively. More efficient management of registers can also result in higher performance if the reduction of the register file size is not the goal. Traditional register file management mechanisms deallocate a physical register only when the next instruction with the same destination architectural register commits. We propose two complementary techniques for deallocating the register immediately after the instruction producing the registers value commits itself, without waiting for the commitment of the next instruction with the same destination. Our design relies on the use of a checkpointed register file (CRF), where a local shadow copy of each bitcell is used to temporarily save the early deallocated register values should they be needed to recover from branch mispredictions or to reconstruct the precise state after exceptions or interrupts. The proposed techniques outperform the previously proposed schemes for early deallocation of registers. For the register-constrained datapath configurations, our techniques result in up to 35% performance increase with 23.3% increase on the average across SPEC2000 benchmarks.

international symposium on low power electronics and design | 2008

Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors

Avadh Patel; Kanad Ghose

We present a snoop filtering mechanism for multicore microprocessors that implement coherent caches using the MESI protocol. The relatively small filter structure at each core maintains coarse-grain sharing information about regions within a page to filter out snoops. On broadcast, the sharing status of all regions within the page is collected proactively and up to 90% of unnecessary snoops are eliminated. The energy savings resulting from snoop filtering in our scheme average about 30% across the benchmarks studied for both a quad core design in 65 nm and 8-core design in 45 nm CMOS.

conference on high performance computing supercomputing | 1989

The HCN: a versatile interconnection network based on cubes

Kanad Ghose; Kiran Raghavendra Desai

This paper introduces a family of interconnection networks for loosely-coupled multiprocessors called Hierarchical Cubic Networks (HCNs). HCNs use the well-known hypercube network as their basic building block. Using a considerably lower number of links per node, HCNs realize lower network diameters than the hypercube. The performance of several well-known applications on a hypothetical system employing the HCN is identical to their performance on a hypercube. HCNs thus enjoy the same advantages as a hypercube, albeit with considerably simpler interconnections.

IEEE Transactions on Very Large Scale Integration Systems | 2003

Energy-efficient issue queue design

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose; Peter M. Kogge

The out-of-order issue queue (IQ), used in modern superscalar processors is a considerable source of energy dissipation. We consider design alternatives that result in significant reductions in the power dissipation of the IQ (by as much as 75%) through the use of comparators that dissipate energy mainly on a tag match, 0-B encoding of operands to imply the presence of bytes with all zeros and, bitline segmentation. Our results are validated by the execution of SPEC 95 benchmarks on a true hardware level, cycle-by-cycle simulator for a superscalar processor and SPICE measurements for actual layouts of the IQ in a 0.18-/spl mu/m CMOS process.

european conference on computer systems | 2007

hFS: a hybrid file system prototype for improving small file and metadata performance

Zhihui Zhang; Kanad Ghose

Two oft-cited file systems, the Fast File System (FFS) and the Log-Structured File System (LFS), adopt two sharply different update strategies---update-in-place and update-out-of-place. This paper introduces the design and implementation of a hybrid file system called hFS, which combines the strengths of FFS and LFS while avoiding their weaknesses. This is accomplished by distributing file system data into two partitions based on their size and type. In hFS, data blocks of large regular files are stored in a data partition arranged in a FFS-like fashion, while metadata and small files are stored in a separate log partition organized in the spirit of LFS but without incurring any cleaning overhead. This segregation makes it possible to use more appropriate layouts for different data than would otherwise be possible. In particular, hFS has the ability to perform clustered I/O on all kinds of data---including small files, metadata, and large files. We have implemented a prototype of hFS on FreeBSD and have compared its performance against three file systems, including FFS with Soft Updates, a port of NetBSDs LFS, and our lightweight journaling file system called yFS. Results on a number of benchmarks show that hFS has excellent small file and metadata performance. For example, hFS beats FFS with Soft Updates in the range from 53% to 63% in the PostMark benchmark.

international symposium on low power electronics and design | 2001

Energy: efficient instruction dispatch buffer design for superscalar processors

Gurhan Kucuk; Kanad Ghose; Dimitry V. Ponomarev; Peter M. Kogge

The instruction dispatch buffer (DB, also known as an issue queue) used in modem superscalar processors is a considerable source of energy dissipation. We consider design alternatives that result in significant reductions in the power dissipation of the DB (by as much as 60%) through the use of: (a) fast comparators that dissipate energy mainly on a tag match, (b) zero byte encoding of operands to imply the presence of bytes with all zeros and, (c) bitline segmentation. Our results are validated by the execution of SPEC 95 benchmarks on true hardware level, cycle-by-cycle simulator for a superscalar processor and SPICE measurements for actual layouts of the DB and its variants in a 0.5 micron CMOS process.

Explore More