Chidamber R. Kulkarni

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chidamber R. Kulkarni is active.

Explore More

Publication

Featured researches published by Chidamber R. Kulkarni.

design automation conference | 2004

Mapping a domain specific language to a platform FPGA

Chidamber R. Kulkarni; Gordon J. Brebner; Graham Schelle

A domain specific language (DSL) enables designers to rapidly specify and implement systems for a particular domain, yielding designs that are easy to understand, reason about, re-use and maintain. However, there is usually a significant overhead in the required infrastructure to map such a DSL on to a programmable logic device. In this paper, we present a mapping of an existing DSL for the networking domain on to a platform FPGA by embedding the DSL into an existing language infrastructure. In particular, we will show that, using few basic concepts, we are able to achieve a successful mapping of the DSL on to a platform FPGA and create a re-usable structure that also makes it easy to extend the DSL. Finally we will present some results of mapping the DSL on to a platform FPGA and comment on the resulting overhead.

field-programmable custom computing machines | 2007

Configurable Transactional Memory

Chirstoforos Kachris; Chidamber R. Kulkarni

Programming efficiency of heterogeneous concurrent systems is limited by the use of lock-based synchronization mechanisms. Transactional memories can greatly improve the programming efficiency of such systems. In field-programmable computing machines, a conventional fixed transactional memory becomes inefficient use of the silicon. We propose configurable transactional memory (CTM) as a mechanism to implement application specific synchronization that utilizes the field-programmability of such devices to match with the requirements of an application. The proposed configurable transactional memory is targeted at embedded applications and is area efficient compared to conventional schemes that are implemented with cache-coherent protocols. In particular, the CTM is designed to be incorporated in to compilation and synthesis paths of either high-level languages or during system creation process using tools such as Xilinx EDK. We study the impact of deploying a CTM in a packet metering and statistics application and two micro-benchmarks as compared to a lock-based synchronization scheme. We have implemented this application in a Xilinx Virtex4 device and found that the CTM was 0-73% better than a fine-grained lock-based scheme.Augmented reality (AR) is a highly interdisciplinary field which has received increasing attention since late 90s. Basically, it consists of a combination of the real scene viewed by a user and a computer generated image, running in real time. So, AR allows the user to see the real world supplemented, in general, with some information considered as useful, enhancing the users perception and knowledge of the environment. Benefits of reconfigurable hardware for AR have been explored by Luk et al. [4]. However, the wide majority of AR systems have been based so far on PCs or workstations.

field programmable gate arrays | 2012

A lean FPGA soft processor built using a DSP block

Hui Yan Cheah; Suhaib A. Fahmy; Douglas L. Maskell; Chidamber R. Kulkarni

As Field Programmable Gate Arrays (FPGAs) have advanced, the capabilities and variety of embedded resources have increased. In the last decade, signal processing has become one of the main driving applications for FPGA adoption, so FPGA vendors tailored their architectures to such applications. The resulting embedded digital signal processing (DSP) blocks have now advanced to the point of supporting a wide range of operations. In this paper, we explore how these DSP blocks can be applied to general computation. We show that the DSP48E1 blocks in Xilinx Virtex-6 devices support a wide range of standard processor instructions which can be designed into the core of a basic processor with minimal additional logic usage.

application-specific systems, architectures, and processors | 2004

Hyper-programmable architectures for adaptable networked systems

Gordon J. Brebner; Philip B. James-Roxby; Eric Keller; Chidamber R. Kulkarni

We explain how modern programmable logic devices have capabilities that are well suited for them to assume a central role in the implementation of networked systems, now and in the future. To date, such devices have featured largely in ASIC substitution roles within networked systems; this usage has been highly successful, allowing faster times to market and reduced engineering costs. We argue that there are many additional opportunities for productively using these devices. The requirement is exposure of their high inherent computational concurrency matched by concurrent memory accessibility, their rich on-chip interconnectivity and their complete programmability, at a higher level of abstraction that matches the implementation needs of networked systems. We discuss specific examples supporting this view, and present a highly flexible soft platform architecture at an appropriate level of abstraction from physical devices. This may be viewed as a particularly configurable and programmable type of network processor, offering scope both for innovative networked system implementation and for new directions in networking research. In particular, it is aimed at facilitating scalable solutions, matching differently resourced programmable logic devices to differing performance and sophistication requirements of networked systems, from cheap consumer appliances to high-end network switching.

Journal of Systems Architecture | 2011

Transactional memories for multi-processor FPGA platforms

Christoforos Kachris; Chidamber R. Kulkarni

Programming efficiency of heterogeneous concurrent systems is limited by the use of lock-based synchronization mechanisms. Transactional memories can greatly improve the programming efficiency of such systems. In field-programmable computing machines, a conventional fixed transactional memory becomes inefficient use of the silicon. We propose configurable transactional memory (CTM) as a mechanism to implement application specific synchronization that utilizes the field-programmability of such devices to match with the requirements of an application. The proposed configurable transactional memory is targeted at embedded applications and is area efficient compared to conventional schemes that are implemented with cache-coherent protocols. In particular, the CTM is designed to be incorporated in to compilation and synthesis paths of either high-level languages or during system creation process using tools such as Xilinx EDK. The proposed system supports an OpenMP-based programming paradigm for the efficient use of transactional memories. In addition, the conflict detection scheme can be configured to work either in lazy or in eager mode, depending on the application requirements. We study the impact of deploying a CTM using both micro-benchmarks and real applications as compared to a lock-based synchronization scheme. We have implemented the proposed scheme in a Xilinx Virtex4 device and found that the CTM can provide both higher programming efficiency, lower energy consumption and higher speedup than a fine-grained lock-based scheme.

field-programmable logic and applications | 2006

Micro-Coded Datapaths: Populating the Space Between Finite State Machine and Processor

Chidamber R. Kulkarni; Gordon J. Brebner

Domain-specific design flows can enable an efficient path to implementation, as well as making the design process intuitive and the designs reusable. When targeting FPGAs, there are few techniques in high level synthesis that enable thorough exploration of the inherent flexibility of the FPGA fabric as an implementation medium. In this paper, we propose a new methodology, based on micro-coded datapaths, that enables design space exploration of processing engine architectures implemented in programmable logic that range from a fixed finite state machine to a soft processor. As a use case, these processing engines can be embedded within programmable logic threads that are used to carry out network packet processing. We demonstrate the application of this methodology on a network address translation application, and show that micro-coded data paths indeed enable both human designers and automated tools to explore the design space in a structured way, thus exploiting the full potential of the FPGA technology.

design, automation, and test in europe | 2006

Memory centric thread synchronization on platform FPGAs

Chidamber R. Kulkarni; Gordon J. Brebner

Concurrent programs are difficult to write, reason about, re-use, and maintain. In particular, for system-level descriptions that use a shared memory abstraction for thread or process synchronization, the current practice involves manual scheduling of processes, introduction of guard conditions, and clocking tricks, to enforce memory dependencies. This process is tedious, time consuming, and error-prone. At the same time, the need for a concurrent programming model is becoming ever essential to bridge the productivity gap that is widening with every manufacturing process generation. In this paper, we present two novel techniques to automatically enforce memory dependencies in platform FPGAs using on-chip memories, starting from a system-level description. Both the techniques utilize static analysis to generate circuits for enforcing these dependencies. This paper investigates these two techniques for their generality, overhead in implementation, and usefulness or otherwise for different application requirements

field programmable gate arrays | 2006

Building a flexible and scalable DRAM interface for networking applications on FPGAs

Jike Chong; Chidamber R. Kulkarni; Gordon J. Brebner

A fundamental challenge to successful deployment of DRAMs is the availability of a flexible and scalable DRAM interface. This is exacerbated by the application specific nature of the logic-side DRAM interface. This paper presents a study that attempts to overcome this challenge for networking application domain. We quantify the various challenges and present techniques that were implemented to build a flexible and scalable interface to an existing multi-port memory controller for DDR DRAM using a FPGA. We demonstrate the deployment of this new interface in two example applications. We present two novel techniques that enable us to reduce the latency of DRAM related memory accesses and improve throughput. We believe our techniques enable harnessing maximum throughput from existing memory controllers with least possible latency.

Archive | 2005