Shail Aditya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shail Aditya is active.

Explore More

Publication

Featured researches published by Shail Aditya.

IEEE Computer | 2002

PICO: automatically designing custom computers

Vinod Kathail; Shail Aditya; Robert Schreiber; B. Ramakrishna Rau; Darren C. Cronquist; Mukund Sivaraman

The paper discusses the PICO (program in, chip out) project, a long-range HP Labs research effort that aims to automate the design of optimized, application-specific computing systems - thus enabling the rapid and cost-effective design of custom chips when no adequately specialized, off-the-shelf design is available. PICO research takes a systematic approach to the hierarchical design of complex systems and advances technologies for automatically designing custom nonprogrammable accelerators and VLIW processors. While skeptics often assume that automated design must emulate human designers who invent new solutions to problems, PICOs approach is to automatically pick the most suitable designs from a well-engineered space of designs. Such automation of embedded computer design promises an era of yet more growth in the number and variety of innovative smart products by lowering the barriers of design time, designer availability, and design cost.

signal processing systems | 2002

PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Robert Schreiber; Shail Aditya; Scott A. Mahlke; Vinod Kathail; B. Ramakrishna Rau; Darren C. Cronquist; Mukund Sivaraman

The PICO-NPA system automatically synthesizes nonprogrammable accelerators (NPAs) to be used as co-processors for functions expressed as loop nests in C. The NPAs it generates consist of a synchronous array of one or more customized processor datapaths, their controller, local memory, and interfaces. The user, or a design space exploration tool that is a part of the full PICO system, identifies within the application a loop nest to be implemented as an NPA, and indicates the performance required of the NPA by specifying the number of processors and the number of machine cycles that each processor uses per iteration of the inner loop. PICO-NPA emits synthesizable HDL that defines the accelerator at the register transfer level (RTL). The system also modifies the users application software to make use of the generated accelerator.The main objective of PICO-NPA is to reduce design cost and time, without significantly reducing design quality. Design of an NPA and its support software typically requires one or two weeks using PICO-NPA, which is a many-fold improvement over the industry norm. In addition, PICO-NPA can readily generate a wide-range of implementations with scalable performance from a single specification. In experimental comparison of NPAs of equivalent throughput, PICO-NPA designs are slightly more costly than hand-designed accelerators.Logic synthesis and place-and-route have been performed successfully on PICO-NPA designs, which have achieved high clock rates.

application-specific systems, architectures, and processors | 2000

High-level synthesis of nonprogrammable hardware accelerators

Robert Schreiber; Shail Aditya; B. Ramakrishna Rau; Vinod Kathail; Scott A. Mahlke; Santosh G. Abraham; Greg Snider

The PICO-N system automatically synthesizes embedded nonprogrammable accelerators to be used as co-processors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of customized VLIW (very-long instruction word) processors, their controller local memory, and interfaces. The system also modifies the users application software to make use of the generated accelerator. The user indicates the throughput to be achieved by specifying the number of processors and their initiation interval. In experimental comparisons, PICO-N designs are slightly more costly than hand-designed accelerators with the same performance.

international symposium on systems synthesis | 1999

Automatic architectural synthesis of VLIW and EPIC processors

Shail Aditya; B. Ramakrishna Rau; Vinod Kathail

The paper describes a mechanism for automatic design and synthesis of very long instruction word (VLIW), and its generalization, explicitly parallel instruction computing (EPIC) processor architectures starting from an abstract specification of their desired functionality. The process of architecture design makes concrete decisions regarding the number and types of functional units, number of read/write ports on register files, the datapath interconnect, the instruction format, its decoding hardware, and the instruction unit datapath. The processor design is then automatically synthesized into a detailed RTL-level structural model in VHDL, along with an estimate of its area. The system also generates the corresponding detailed machine description and instruction format description that can be used to retarget a compiler and an assembler respectively. All this is part of an overall design system, called Program-In-Chip Out (PICO), which has the ability to perform automatic exploration of the architectural design space while customizing the architecture to a given application and making intelligent, quantitative, cost-performance tradeoffs.

ACM Transactions on Design Automation of Electronic Systems | 2000

Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats

Shail Aditya; Scott A. Mahlke; B. Ramakrishna Rau

PICO is a fully automated system for designing the architecture and the microarchitecture of VLIW and EPIC processors. A serious concern with this class of processors, due to their very long instructions, is their code size. One focus of this paper is to describe a series of code size minimization techniques used within PICO, some of which are applied during the automatic design of the instruction format, while others are applied during program assembly. The design of a retargetable assembler to support these techniques also poses certain novel challenges, which constitute the second focus of this paper. Contrary to widely held perceptions, we demonstrate that it is entirely possible to design VLIW and EPIC processors that are capable of issuing large numbers of operational per cycle, but whose code size is only moderately larger than that for a sequential CISC processor.

compilers, architecture, and synthesis for embedded systems | 2002

Cycle-time aware architecture synthesis of custom hardware accelerators

Mukund Sivaraman; Shail Aditya

We present the cycle-time aware architecture synthesis methodology used in PICO-NPA that automatically synthesizes minimal cost RT-level designs from high-level specifications to meet a given cycle-time. This allows subsequent physical synthesis to succeed on first pass with predictable performance. The core of the methodology is a static timing analysis engine that is used at multiple levels - program-level, architecture-level and RT-level - in order to identify, schedule and validate useful operator chains that are incorporated into the design automatically. We present architecture synthesis results for several embedded applications and evaluate the benefits of this technique.

Design Automation for Embedded Systems | 1999

Machine-Description Driven Compilers for EPIC and VLIW Processors

B. Ramakrishna Rau; Vinod Kathail; Shail Aditya

In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW processors have started to appear. Such processors typically have higher levels of instruction-level parallelism, more registers, and a relatively regular interconnect between the registers and the functional units. The central challenges faced by a code generator for an EPIC (Explicitly Parallel Instruction Computing) or VLIW processor are quite different from those for the earlier DSPs and, consequently, so is the structure of a code generator that is designed to be easily retargetable. In this paper, we explain the nature of the challenges faced by an EPIC or VLIW compiler and present a strategy for performing code generation in an incremental fashion that is best suited to generating high-quality code efficiently. We also describe the Operation Binding Lattice, a formal model for incrementally binding the opcodes and register assignments in an EPIC code generator. As we show, this reflects the phase structure of the EPIC code generator. It also defines the structure of the machine-description database, which is queried by the code generator for the information that it needs about the target processor. Lastly, we discuss our implementation of these ideas and techniques in Elcor, our EPIC compiler research infrastructure.

compilers, architecture, and synthesis for embedded systems | 2001

ShiftQ: a bufferred interconnect for custom loop accelerators

Shail Aditya; Michael S. Schlansker

ShiftQs are hardware structures consisting of registers and switches which buffer and transport operands among function units within custom hardware loop accelerators. ShiftQs help minimize buffering and interconnect costs by customizing the hardware to the given schedule and by intelligent sharing of register and interconnect resources. This paper describes the ShiftQ schema and a method to automatically synthesize them from modulo-scheduled loops. We also evaluate the cost savings by comparing them against traditional storage and interconnect mechanisms.

Archive | 2008

Algorithmic Synthesis Using PICO

Shail Aditya; Vinod Kathail

The increasing SoC complexity and a relentless pressure to reduce time-to-market have left the hardware and system designers with an enormous design challenge. The bulk of the effort in designing an SoC is focused on the design of product-defining application engines such as video codecs and wireless modems. Automatic synthesis of such application engines from a high level algorithmic description can significantly reduce both design time and design cost. This chapter reviews high level requirements for such a system and then describes the PICO (Program-In, Chip-Out) system, which provides an integrated framework for the synthesis and verification of application engines from high level C algorithms. PICOs novel approach relies on aggressive compiler technology, a parallel execution model based on Kahn process networks, and a carefully designed hardware architecture template that is cost-efficient, provides high performance, and is sensitive to circuit level and system level design constraints. PICO addresses the complete hardware design flow including architecture exploration, RTL design, RTL verification, system validation and system integration. For a large class of modern embedded applications, PICOs approach has been shown to yield extremely competitive designs at a fraction of the resources used traditionally thereby closing the proverbial design productivity gap.

languages and compilers for parallel computing | 1996

A Multithreaded Substrate and Compilation Model for the Implicity Parallel Language pH

Arvind; Alejandro Caro; Jan-Willem Maessen; Shail Aditya

We describe the compilation of the non-strict, implicitly parallel language pH to symmetric multiprocessors (SMPs). First, we introduce the λs calculus as a robust foundation for the semantics of pH. Next, we define a Shared-Memory Threaded (SMT) abstract machine that captures the essence of our SMP compilation target. Finally, we describe a syntax directed translation of λs to SMT instructions. The paper makes three important contributions: it is the first implementation of pH based on a direct semantics of barriers; the compilation rules generate code from λs without using intermediate dataflow graphs; and the multithreaded code emitted by the compiler uses suspensive threads.

Explore More