Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Krishnan K. Kailas is active.

Publication


Featured researches published by Krishnan K. Kailas.


international conference on computer design | 1998

An eight-issue tree-VLIW processor for dynamic binary translation

Kemal Ebcioglu; Jason E. Fritts; Stephen V. Kosonocky; Michael Karl Gschwind; Erik R. Altman; Krishnan K. Kailas; Terry Bright

Presented is an 8-issue tree-VLIW processor designed for efficient support of dynamic binary translation. This processor confronts two primary problems faced by VLIW architectures: binary compatibility and branch performance. Binary compatibility with existing architectures is achieved through dynamic binary translation which translates and schedules PowerPC instructions to take advantage of the available instruction level parallelism. Efficient branch performance is achieved through tree instructions that support multi-way path and branch selection within a single VLIW instruction. The processor architecture is described, along with design details of the branch unit, pipeline, register file and memory hierarchy for a 0.25 micron standard-cell design. Performance simulations show that the simplicity of a VLIW architecture allows a wide-issue processor to operate at high frequencies.


Ibm Journal of Research and Development | 2003

An innovative low-power high-performance programmable signal processor for digital communications

Jaime H. Moreno; Victor Zyuban; Uzi Shvadron; Fredy D. Neeser; Jeff H. Derby; Malcolm Scott Ware; Krishnan K. Kailas; Ayal Zaks; Amir Geva; Shay Ben-David; Sameh W. Asaad; Thomas W. Fox; Daniel Littrell; Marina Biberstein; Dorit Naishlos; Hillery C. Hunter

We describe an innovative, low-power, high-performance, programmable signal processor (DSP) for digital communications. The architecture of this processor is characterized by its explicit design for low-power implementations, its innovative ability to jointly exploit instruction-level parallelism and data-level parallelism to achieve high performance, its suitability as a target for an optimizing high-level language compiler, and its explicit replacement of hardware resources by compile-time practices. We describe the methodology used in the development of the processor, highlighting the techniques deployed to enable application/architecture/compiler/implementation co-development, and the optimization approach and metric used for power-performance evaluation and tradeoff analysis. We summarize the salient features of the architecture, provide a brief description of the hardware organization, and discuss the compiler techniques used to exercise these features. We also summarize the simulation environment and associated software development tools. Coding examples from two representative kernels in the digital communications domain are also provided. The resulting methodology, architecture, and compiler represent an advance of the state of the art in the area of low-power, domain-specific microprocessors.


european conference on parallel processing | 2002

A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Krishnan K. Kailas; Manoj Franklin; Kemal Ebcioglu

In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into on-chip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction. Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. We present an efficient code generation algorithm to schedule sendb operations on-the-fly. Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping remote registers.


formal methods in computer-aided design | 2009

Formal verification of correctness and performance of random priority-based arbiters

Krishnan K. Kailas; Viresh Paruthi; Brian Chan Monwai

Arbiters play a critical role in the performance of electronic systems. In this paper, we describe a novel method to formally verify correctness and performance of random priority-based arbiters. We define a property of random number sequences, called Complete Random Sequence (CRS), to characterize bounded fairness properties of random number generators and random priority-based arbiters. We propose a three step verification method utilizing the notion of CRS to establish deadlock-free operation of the arbiters, and to accurately quantify the request-to-grant delays. The proposed verification method may additionally be leveraged to tune systems composed of random priority-based arbiters and pseudo-random number generators, such as linear feedback shift registers (LFSRs), for optimal performance. We have successfully applied the approach to verify a host of cache arbiters and interconnection network controllers of commercial microprocessors.


information hiding | 2005

Data hiding in compiled program binaries for enhancing computer system performance

Ashwin Swaminathan; Yinian Mao; Min Wu; Krishnan K. Kailas

Information hiding has been studied in many security applications such as authentication, copyright management and digital forensics. In this work, we introduce a new application where successful information hiding in compiled program binaries could bring system-wide performance improvements. Our goal is to enhance computer system performance by providing additional information to the processor, without changing the instruction set architecture. We first analyze the statistics of typical programs to demonstrate the feasibility of hiding data in them. We then propose several techniques to hide a large amount of data in the operand fields with very low computation and storage requirements during the extraction process. The data embedding is made reversible to recover the original instructions and to ensure the correct execution of the computer program. Our experiments on the SPEC CPU2000 benchmark programs show that up to 110K bits of information can be embedded in large programs with as little as 3K bits of additional run-time memory in the form of a simple look-up table.


Archive | 2006

Method and apparatus for register renaming using multiple physical register files and avoiding associative search

William E. Burky; Krishnan K. Kailas; Balaram Sinharoy


Archive | 2007

Method and System for Tracking Instruction Dependency in an Out-of-Order Processor

William E. Burky; Krishnan K. Kailas


Archive | 2006

Method and apparatus for dynamic priority-based cache replacement

Krishnan K. Kailas; Rajiv Alazhath Ravindran; Zehra Sura


Archive | 2006

Method and apparatus for fast synchronization and out-of-order execution of instructions in a meta-program based computing system

Krishnan K. Kailas


high-performance computer architecture | 2014

3D stacking of high-performance processors

Philip G. Emma; Alper Buyuktosunoglu; Michael B. Healy; Krishnan K. Kailas; Valentin Puente; Roy Yu; Allan M. Hartstein; Pradip Bose; Jaime H. Moreno

Researchain Logo
Decentralizing Knowledge