Is this you? Create Your Porfile

Nikil D. Dutt

University of California, Irvine

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nikil D. Dutt is active.

Explore More

Publication

Featured researches published by Nikil D. Dutt.

design, automation, and test in europe | 1999

EXPRESSION: a language for architecture exploration through compiler/simulator retargetability

Ashok Halambi; Peter Grun; Vijay Ganesh; Asheesh Khare; Nikil D. Dutt; Alexandru Nicolau

We describe EXPRESSION, a language supporting architectural design space exploration for embedded systems-on-chip (SOC) and automatic generation of a retargetable compiler/simulator toolkit. Key features of our language-driven design methodology include: a mixed behavioral/structural representation supporting a natural specification of the architecture, explicit specification of the memory, subsystem allowing novel memory organizations and hierarchies; clean syntax and ease of modification supporting architectural exploration; a single specification supporting consistency and completeness checking of the architecture; and efficient specification of architectural resource constraints allowing extraction of detailed reservation tables for compiler scheduling. We illustrate key features of EXPRESSION through simple examples and demonstrate its efficacy in supporting exploration and automatic software toolkit generation for an embedded SOC codesign flow.

ACM Transactions on Design Automation of Electronic Systems | 2001

Data and memory optimization techniques for embedded systems

Preeti Ranjan Panda; Francky Catthoor; Nikil D. Dutt; Koen Danckaert; Erik Brockmeyer; Chidamber Kulkarni; A Vandercappelle; Per Gunnar Kjeldsberg

We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation. We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.

international conference on vlsi design | 2003

SPARK: a high-level synthesis framework for applying parallelizing compiler transformations

Sumit Gupta; Nikil D. Dutt; Rajesh K. Gupta; Alexandru Nicolau

This paper presents a modular and extensible high-level synthesis research system, called SPARK, that takes a behavioral description in ANSI-C as input and produces synthesizable register-transfer level VHDL. SPARK uses parallelizing compiler technology, developed previously, to enhance instruction-level parallelism and re-instruments it for high-level synthesis by incorporating ideas of mutual exclusivity of operations, resource sharing and hardware cost models. In this paper, we present the design flow through the SPARK system, a set of transformations that include speculative code motions and dynamic transformations and show how these transformations and other optimizing synthesis and compiler techniques are employed by a scheduling heuristic. Experiments are performed on two moderately complex industrial applications, namely MPEG-1 and the GIMP image processing tool. The results show that the various code transformations lead to up to 70 % improvements in performance without any increase in the overall area and critical path of the final synthesized design.

european design and test conference | 1997

Efficient utilization of scratch-pad memory in embedded processor applications

Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau

Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on microprocessor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the applications scalar and array variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.

ACM Transactions on Design Automation of Electronic Systems | 2000

On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau

Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on processor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications, so that critical data can be stored there with a guaranteed fast access time. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the applications scalar and arrayed variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. We also present extensions of our proposed memory assignment strategy to handle context switching between multiple programs, as well as a generalized memory hierarchy. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.

international symposium on microarchitecture | 1992

Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

Andrea Capitanio; Nikil D. Dutt; Alexandru Nicolau

An ideal VLIW architecture requires a large multiport registerfile that is currently not realizable an practice. We analyze a Limited Connectivity VLIW architecture as a realizable alternative that limits the number of ports. We present a fine-grain code partitioning method for this model. The partitioning scheme was applied to a number of standard benchmarks by varying the number ports on a RF, the number of partitions and the communication bandwidth between partitions. We present these results, along with a preliminary analysis of architectural tradeoffs in the actual implementation of these Limited Connectivity VLIWs.

design, automation, and test in europe | 2002

Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints

Ana Azevedo; Ilya Issenin; Radu Cornea; Rajesh K. Gupta; Nikil D. Dutt; Alexander V. Veidenbaum; Alexandru Nicolau

Dynamic voltage scaling (DVS) is a known effective mechanism for reducing CPU energy consumption without significant performance degradation. While a lot of work has been done on inter-task scheduling algorithms to implement DVS under operating system control, new research challenges exist in intra-task DVS techniques under software and compiler control. In this paper we introduce a novel intra-task DVS technique under compiler control using program checkpoints. Checkpoints are generated at compile time and indicate places in the code where the processor speed and voltage should be re-calculated. Checkpoints also carry user-defined time constraints. Our technique handles multiple intra-task performance deadlines and modulates power consumption according to a run-time power budget. We experimented with two heuristics for adjusting the clock frequency and voltage. For the particular benchmark studied, one heuristic yielded 63% more energy savings than the other. With the best of the heuristics we designed, our technique resulted in 82% energy savings over the execution of the program without employing DVS.

Neural Networks | 2009

2009 Special Issue: A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors

Jayram Moorkanikara Nageswaran; Nikil D. Dutt; Jeffrey L. Krichmar; Alex Nicolau; Alexander V. Veidenbaum

Neural network simulators that take into account the spiking behavior of neurons are useful for studying brain mechanisms and for various neural engineering applications. Spiking Neural Network (SNN) simulators have been traditionally simulated on large-scale clusters, super-computers, or on dedicated hardware architectures. Alternatively, Compute Unified Device Architecture (CUDA) Graphics Processing Units (GPUs) can provide a low-cost, programmable, and high-performance computing platform for simulation of SNNs. In this paper we demonstrate an efficient, biologically realistic, large-scale SNN simulator that runs on a single GPU. The SNN model includes Izhikevich spiking neurons, detailed models of synaptic plasticity and variable axonal delay. We allow user-defined configuration of the GPU-SNN model by means of a high-level programming interface written in C++ but similar to the PyNN programming interface specification. PyNN is a common programming interface developed by the neuronal simulation community to allow a single script to run on various simulators. The GPU implementation (on NVIDIA GTX-280 with 1 GB of memory) is up to 26 times faster than a CPU version for the simulation of 100K neurons with 50 Million synaptic connections, firing at an average rate of 7 Hz. For simulation of 10 Million synaptic connections and 100K neurons, the GPU SNN model is only 1.5 times slower than real-time. Further, we present a collection of new techniques related to parallelism extraction, mapping of irregular communication, and network representation for effective simulation of SNNs on GPUs. The fidelity of the simulation results was validated on CPU simulations using firing rate, synaptic weight distribution, and inter-spike interval analysis. Our simulator is publicly available to the modeling community so that researchers will have easy access to large-scale SNN simulations.

design automation conference | 2003

Instruction set compiled simulation: a technique for fast and flexible instruction set simulation

Mehrdad Reshadi; Prabhat Mishra; Nikil D. Dutt

Instruction set simulators are critical tools for the exploration and validation of new programmable architectures. Due to increasing complexity of the architectures and time-to-market pressure, performance is the most important feature of an instruction-set simulator. Interpretive simulators are flexible but slow, whereas compiled simulators deliver speed at the cost of flexibility. This paper presents a novel technique for generation of fast instruction-set simulators that combines the benefit of both compiled and interpretive simulation. We achieve fast instruction accurate simulation through two mechanisms. First, we move the time-consuming decoding process from run-time to compile time while maintaining the flexibility of the interpretive simulation. Second, we use a novel instruction abstraction technique to generate aggressively optimized decoded instructions that further improves simulation performance. Our instruction set compiled simulation (IS-CS) technique delivers up to 40% performance improvement over the best known published result that has the flexibility of the interpretive simulation. We illustrate the applicability of the IS-CS technique using the ARM7 embedded processor.

design automation conference | 2004

Extending the transaction level modeling approach for fast communication architecture exploration

Sudeep Pasricha; Nikil D. Dutt; Mohamed Ben-Romdhane

System-on-chip (SoC) designs are increasingly becoming more complex. Efficient on chip communication architectures are critical for achieving desired performance in these systems. System designers typically use Bus Cycle Accurate (BCA) models written in high level languages such as C/C++ to explore the communication design space. These models capture all of the bus signals and strictly maintain cycle accuracy, which is useful for reliable performance exploration but results in slow simulation speeds for complex designs, even when they are modeled using high level languages. Recently there have been several efforts to use the Transaction Level Modeling (TLM) paradigm for improving simulation performance of BCA models. However these BCA models capture a lot of details that can be eliminated when exploring communications architectures.

Explore More