U. Nagaraj Shenoy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where U. Nagaraj Shenoy is active.

Explore More

Publication

Featured researches published by U. Nagaraj Shenoy.

ACM Transactions on Programming Languages and Systems | 1999

A global communication optimization technique based on data-flow analysis and linear algebra

Mahmut T. Kandemir; Prithviraj Banerjee; Alok N. Choudhary; J. Ramanujam; U. Nagaraj Shenoy

Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this article, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and it works on structured programs with conditional statements and nested loops but without arbitrary goto statements.The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distribu-tions, and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing, and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (anaverage of 32% reduction), the volume of the data communicated (an average of 37%reduction), and the execution time (an average of 26% reduction).

international conference on supercomputing | 1998

A hyperplane based approach for optimizing spatial locality in loop nests

Mahmut T. Kandemir; Alok N. Choudhary; U. Nagaraj Shenoy; Prithviraj Banerjee; J. Ramanujam

This paper presents a data layout optimization technique based on the theory of hyperplanes from linear algebra. Given a program, our framework automatically determines the optimal layouts that can be expressed by hyperplanes for each array that is referenced. We discuss the cases where data transformations are preferable to loop transformations and show that under specific conditions a loop nest can be optimized for perfect spatial locality by using data transformations. We divide the problem of optimizing data layout into two independent subproblems: (1) determining optimal layouts, and (2) determining data transformation matrices to implement optimal layouts. By postponing the determination of the transformation matrix to the last stage, our method can be adapted to compilers with different default layouts. Our results on eight programs on SGI Origin 2000 distributed-shared-memory multiprocessor show that the layout optimizations are effective in optimizing spatial locality.

design, automation, and test in europe | 2000

A system-level synthesis algorithm with guaranteed solution quality

U. Nagaraj Shenoy; Prithviraj Banerjee; Alok N. Choudhary

Recently a number of heuristic based system-level synthesis algorithms have been proposed. Though these algorithms quickly generate good solutions, how close these solutions are to optimal is a question that is difficult to answer. While current exact techniques produce optimal results, they fail to produce them in reasonable time. This paper presents a synthesis algorithm that produces solutions of guaranteed quality (optimal in most cases or within a known bound) with practical synthesis times (few seconds to minutes). It takes a unified look (the lack of which is one of the main sources of sub-optimality in the heuristic techniques) at different aspects of system synthesis such as pipelining, selection, allocation, scheduling and FPGA reconfiguration. Our technique can handle both time constrained as well as resource constrained synthesis problems. We present results of our algorithm implemented as part of the Match project at Northwestern University.

european conference on parallel processing | 1998

Enhancing Spatial Locality via Data Layout Optimizations

Mahmut T. Kandemir; Alok N. Choudhary; J. Ramanujam; U. Nagaraj Shenoy; Prithviraj Banerjee

This paper aims to improve locality of references by suitably choosing array layouts. We use a new definition of spatial reuse vectors that takes into account memory layout of arrays. This capability creates two opportunities. First, it allows us to develop an array restructuring framework based on a combination of hyperplane theory and reuse vec- tors. Second, it allows us to observe the effect of different array layout optimizations on spatial reuse vectors. Since the iteration space based lo- cality optimizations also change the spatial reuse vectors, our approach allows us to compare the iteration-space based and data-space based approaches in terms of their effects on spatial reuse vectors. We illus- trate the effectiveness of our technique using an example from the BLAS library on the SGI Origin distributed shared-memory machine.

ACM Transactions on Design Automation of Electronic Systems | 2001

An Algorithm for Synthesis of Large Time- Constrained Heterogeneous Adaptive Systems

U. Nagaraj Shenoy; Alok N. Choudhary; Prithviraj Banerjee

Large time-constrained applications are highly computer-intensive and are often implemented as a complex organization of pipelined data parallel tasks on a pool of embedded processors, DSP processors, and FPGAs. The large number of design alternatives available at each task level, the application as a whole, and the special needs of the reconfigurable devices (such as the FPGA) make the manual synthesis of such systems very tedious. The automatic synthesis algorithm in this paper combines exact (MILP-based) and heuristic techniques to solve this problem, which basically involves (1) propagation of timing constraints; (2) pipelining the loops to meet throughput requirements; (3) resource selection and scheduling, keeping the processing requirements and the timing constraints in view; (4) scheduling the resources across the tasks to ensure maximum utilization; and (5) hiding the reconfiguration delays of the FPGAs. While the use of MILP techniques helps in getting high-quality results, combining them with heuristics ensures acceptable synthesis times, striking a good balance between quality of results and synthesis time. Our experimental evaluation of the algorithm shows an average 40% in resource cost reduction (compared to manual synthesis) with synthesis times from minutes to as low as a few seconds in some cases.

international conference on supercomputing | 1998

An efficient uniform run-time scheme for mixed regular-irregular applications

Dhruva R. Chakrabarti; U. Nagaraj Shenoy; Alok N. Choudhary; Prithviraj Banerjee

Almost all applications containing indirect array addressing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous percentage of these direct array accesses usually require interprocessor communication for the applications to run on a distributed memory multicomputer. This study highlights how lack of a uniform representation and lack of a uniform scheme to generate communication structures and parallel code for regular and irregular accesses in a mixed regularirregular application prevent sophisticated optimizations. Furthermore, we also show that code generated for regular accesses using compile-time schemes are not alzvays compatible to code generated for irregular accesses using run-time schemes. In our opinion, existing schemes handling mixed regular-irregular applications either incur unnecessary preprocessing costs or fail to perform the best communication optimization. This study presents a uniform scheme to handle both regular and irregular accesses in a mixed regularirregular application. While this allows for sophisticated communication optimizations such as message coalescing, message aggregation to be made across regular and irregular accesses, the preprocessing costs incurred are likely to be minimum. Experimental comparisons for various benchmarks on a 16-processor IBM SP-2 show that our scheme is feasible and better than existing schemes.

international conference on parallel processing | 2000

Match virtual machine: an adaptive runtime system to execute MATLAB in parallel

Malay Haldar; Anshuman Nayak; Abhay Kanhere; Pramod G. Joisha; U. Nagaraj Shenoy; Alok N. Choudhary; Prithviraj Banerjee

MATLAB is one of the most popular languages for desktop numerical computations as well as for signal and image processing applications. Applying parallel processing techniques to improve performance of MATLAB codes has been the goal of many recent works. Most current frameworks require the user to specify parallelism and/or information regarding type/shape of the variables, thereby sacrificing the user friendliness which is one of the most popular MATLAB features. Other systems work on a restricted subset of MATLAB, thereby limiting the class of applications MATLAB can support. We present a runtime system capable of executing MATLAB code in parallel without any user intervention. The runtime system performs automatic parallelization and type/shape inference of the code at runtime. A unique feature of the runtime system is its capability to automatically adapt to changes in the underlying architecture, making it particularly useful for systems where predicting performance statically is difficult. We present experimental results obtained for the runtime system running on SGI Origin2000 shared memory multiprocessor.

merged international parallel processing symposium and symposium on parallel and distributed processing | 1998

A generalized framework for global communication optimization

Mahmut T. Kandemir; Prithviraj Banerjee; Alok N. Choudhary; J. Ramanujam; U. Nagaraj Shenoy

In distributed memory message passing architectures reducing communication cost is extremely important. We present a technique to optimize communication globally. Our approach is based on a combination of linear algebra and dataflow analysis, and can take arbitrary control flow into account. The distinctive features of the algorithm are its accuracy in keeping communication set information and its support for general alignments and distributions including block-cyclic distributions. The method is being implemented in the PARADIGM compiler. The preliminary results show that the technique is effective in reducing both the number as well as the volume of communication.

international conference on parallel processing | 1998

Minimizing data and synchronization costs in one-way communication

Mahmut T. Kandemir; U. Nagaraj Shenoy; Prithviraj Banerjee; J. Ramanujam; Alok N. Choudhary

In contrast to the conventional send/receive model, the one-way communication model using Put and Synch allows the decoupling of message transmission from synchronization. This opens up new opportunities not only to further optimize communication but also to reduce synchronization overhead. We present a general technique which uses a global dataflow framework to optimize communication and synchronization in the context of the one-way communication model. Our approach works with the most general data alignments and distributions in languages like HPF, and is more powerful than other current solutions for eliminating redundant synchronization messages. Preliminary results on several scientific benchmarks demonstrate that our approach is successful in minimizing the number of data and synchronization messages.

international conference on apl | 2000

Handling context-sensitive syntactic issues in the design of a front-end for a MATLAB compiler

Pramod G. Joisha; Abhay Kanhere; Prithviraj Banerjee; U. Nagaraj Shenoy; Alok N. Choudhary

In recent times, the MATLAB language has emerged as a popular alternative for programming in diverse application domains such as signal processing and meteorology. The language has a powerful array syntax with a large set of pre-defined operators and functions that operate on arrays or array sections, making it an ideal candidate for applications involving substantial array-based processing.Yet, for all the programming convenience that the language offers, designing a parser and scanner capable of mimicking the languages syntax has proven to be an acutely difficult task. The language has many context-sensitive constructions, and though numerous front-end implementations of MATLAB and MATLAB-like languages exist, not much has been discussed regarding the efficient compile-time parsing of such languages or how its syntax impacts the parsing process.In this paper, we present the design and implementation of a compiler front-end for the MATLAB language. We discuss in detail both the indigenously designed grammar responsible for syntax analysis as well as the lexical specification that complements the grammar. In the course of our attempts to emulate MATLABs syntax, we were able to unravel certain key issues relating to its syntax, such as the complications arising in parsing command-form function invocations within a compile-time environment, the context-sensitive interpretation of the single quote character, and the translation of white space within matrices into element separators.The front-end effects a conversion of the original source to an intermediate form in which statements are represented as abstract syntax trees and the flow of control between statements by a control-flow graph. All subsequent compiler passes work on this intermediate representation.The front-end was designed and implemented as part of the MATCH project, which addresses the translation of a MATLAB program by a compiler onto a heterogeneous target consisting of embedded and commerical-off-the-shelf processors.

Explore More