Alexandru Nicolau | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandru Nicolau is active.

Explore More

Publication

Featured researches published by Alexandru Nicolau.

european design and test conference | 1997

Efficient utilization of scratch-pad memory in embedded processor applications

Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau

Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on microprocessor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the applications scalar and array variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.

international symposium on systems synthesis | 1996

Memory organization for improved data cache performance in embedded processors

Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau

Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving data cache performance by organizing variables declared in embedded code into memory, using specific parameters of the data cache. Our approach clusters variables to minimize compulsory cache misses, and solves the memory assignment problem to minimize conflict cache misses. Our experiments demonstrate significant improvement in data cache performance (average 46% in hit ratios) by the application of our memory organization technique using code kernels from DSP and other domains on the LSI Logic CW4001 embedded processor.

international symposium on systems synthesis | 1995

Optimal register assignment to loops for embedded code generation

David J. Kolson; Alexandru Nicolau; Nikil D. Dutt; Ken Kennedy

Abstract: One of the challenging tasks in code generation for embedded systems is register assignment. When more live variables than registers exist, some variables are necessarily accessed from data memory. Because loops are typically executed many times and are often time-critical, good register assignment in loops is exceedingly important, since accessing data memory can degrade performance. The issue of finding an optimal register assignment to loops, one which minimizes the number of spills between registers and memory, has been open for some time. In this paper, we address this issue and present an optimal, but exponential, algorithm which assigns registers to loop bodies such that the resulting spill code is minimal. We also show that a heuristic modification performs as well as the exponential approach on typical loops from scientific code.

languages and compilers for parallel computing | 1994

Mutation Scheduling: A Unified Approach to Compiling for Fine-Grain Parallelism

Steven Novack; Alexandru Nicolau

Trade-offs between code selection, register allocation, and instruction scheduling are inherently interdependent, especially when compiling for fine-grain parallel architectures. However, the conventional approach to compiling for such machines arbitrarily separates these phases so that decisions made during any one phase place unnecessary constraints on the remaining phases. Mutation Scheduling attempts to solve this problem by combining code selection, register allocation, and instruction scheduling into a unified framework in which trade-offs between the functional, register, and memory bandwidth resources of the target architecture are made “on the fly” in response to changing resource constraints and availability.

Code Generation for Embedded Processors | 2002

A Unified Code Generation Approach Using Mutation Scheduling

Steven Novack; Alexandru Nicolau; Nikil D. Dutt

Code generation for ASIPs requires tradeoffs between code selection, register allocation, and instruction scheduling in order to achieve high-quality code. Conventional approaches order these phases and apply them separately to simplify the code generation task. Consequently, decisions made during any one phase may unnecessarily constrain the remaining phases, resulting in elimination of potentially better alternatives. Mutation Scheduling solves this problem by combining code selection, register allocation, and instruction scheduling into a unified framework in which trade-offs between the functional, register, interconnect and memory bandwidth resources of the target architecture are made “on the fly” in response to changing resource constraints and availability.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1996

Elimination of redundant memory traffic in high-level synthesis

David J. Kolson; Alexandru Nicolau; Nikil D. Dutt

This paper presents a new transformation for the scheduling of memory-access operations in high-level synthesis. This transformation is suited to memory-intensive applications with synthesized designs containing a secondary store accessed by explicit instructions. Such memory-intensive behaviors are commonly observed in video compression, image convolution, hydrodynamics and mechatronics. Our transformation removes load and store instructions which become redundant or unnecessary during the transformation of loops. The advantage of this reduction is the decrease of secondary memory bandwidth demands. This technique is implemented in our Percolation-Based Scheduler which we used to conduct experiments on a suite of memory-intensive benchmarks. Our results demonstrate a significant reduction in the number of memory operations and an increase in performance on these benchmarks.

ieee international conference on high performance computing data and analytics | 1997

Achieving Multi-level Parallelization

Carrie J. Brownhill; Alexandru Nicolau; Steven Novack; Constantine D. Polychronopoulos

Many modern machine architectures feature parallel processing at both the fine-grain and coarse-grain level. In order to efficiently utilize these multiple levels; a parallelizing compiler must orchestrate the interactions of fine-grain and coarse-grain transformations. The goal of the PROMIS compiler project is to develop a multi-source, multitarget parallelizing compiler in which the front-end and back-end are integrated via a single unified intermediate representation. In this paper, we examine the appropriateness of the Hierarchical Task Graph as that representation.

Lecture Notes in Computer Science | 1997

Improving cache Performance Through Tiling and Data Alignment

Preeti Ranjan Panda; Hiroshi Nakamura; Nikil D. Dutt; Alexandru Nicolau

We address the problem of improving the data cache performance of numerical applications — specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes.

languages and compilers for parallel computing | 1995

A Simple Mechanism for Improving the Accuracy and Efficiency of Instruction-Level Disambiguation

Steven Novack; Joseph Hummel; Alexandru Nicolau

Compilers typically treat source-level and instruction-level issues as independent phases of compilation so that much of the information that might be available to source-level systems of transformations about the semantics of the high-level language and its implementation, as well as the algorithm in some cases, is generally “lost in the translation”, making it unavailable to instruction-level systems of transformations. While this separation of concerns reduces the complexity of the compiler and facilitates porting the compiler to different source/target combinations, it also forces the costly recomputation at the instruction-level of much of the information that was already available to the higher level, and in many cases introduces spurious dependencies that can not be eliminated by instruction-level analysis alone.

languages and compilers for parallel computing | 1993

VISTA: The Visual Interface for Scheduling Transformations and Analysis

Steven Novack; Alexandru Nicolau

VISTA is a visually oriented, interactive environment for parallelizing sequential programs at the instruction level for execution on fine-grain architectures. Fully automatic parallelization techniques often perform well, but may not be able to achieve the strict performance and code size requirements needed for some critical applications. In such cases, manual manipulation by an expert user can often provide enough improvements in the parallelization process to meet the requirements of the application. Using VISTA, an expert user fine-tunes the parallelization process by providing rules and directives to the system in response to graphical and numeric feedback provided by the system.

Explore More