Om Prakash Gangwal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Om Prakash Gangwal is active.

Explore More

Publication

Featured researches published by Om Prakash Gangwal.

design, automation, and test in europe | 2005

A Design Flow for Application-Specific Networks on Chip with Guaranteed Performance to Accelerate SOC Design and Verification

Kees Goossens; John Dielissen; Om Prakash Gangwal; Santiago González Pestana; Andrei Radulescu; Edwin Rijpkema

Systems on chip (SOC) are composed of intellectual property blocks (IP) and interconnect. While mature tooling exists to design the former, tooling for interconnect design is still a research area. In this paper we describe an operational design flow that generates and configures application-specific network on chip (NOC) instances, given application communication requirements. The NOC can be simulated in SystemC and RTL VHDL. An independent performance verification tool verifies analytically that the NOC instance (hardware) and its configuration (software) together meet the application performance requirements. The Æthereal NOCs guaranteed performance is essential to replace time-consuming simulation by fast analytical performance validation. As a result, application-specific NOCs that are guaranteed to meet the applications communication requirements are generated and verified in minutes, reducing the number of design iterations. A realistic MPEG SOC example substantiates our claims.

Design Automation for Embedded Systems | 2002

C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems

Andre K. Nieuwland; Jeffrey Kang; Om Prakash Gangwal; Ramanathan Sethuraman; Natalino G. Busá; Kees Goossens; Rafael Peset Llopis; Paul E. R. Lippens

The key issue in the design of Systems-on-a-Chip (SoC) is to trade-off efficiency against flexibility, and time to market versus cost. Current deep submicron processing technologiesenable integration of multiple software programmable processors (e.g., CPUs,DSPs) and dedicated hardware components into a single cost-efficient IC. Ourtop-down design methodology with various abstraction levels helps designingthese ICs in a reasonable amount of time. This methodology starts with a high-levelexecutable specification, and converges towards a silicon implementation.A major task in the design process is to ensure that all components (hardwareand software) communicate with each other correctly. In this article, we tacklethis problem in the context of the signal processing domain in two ways: wepropose a modular, flexible, and scalable heterogeneous multi-processor architecturetemplate based on distributed shared memory, and we present an efficient andtransparent protocol for communication and (re)configuration. The protocolimplementations have been incorporated in libraries, which allows quick traversalof the various abstraction levels, so enabling incremental design. The designdecisions to be taken at each abstraction level are evaluated by means of(co-)simulation. Prototyping is used too, to verify the systems functionalcorrectness. The effectiveness of our approach is illustrated by a designcase of a multi-standard video and image codec.

design, automation, and test in europe | 2004

Cost-performance trade-offs in networks on chip: a simulation-based approach

Santiago González Pestana; Edwin Rijpkema; Andrei Radulescu; Kees Goossens; Om Prakash Gangwal

A challenge facing designers of systems on chip (SoC) containing networks on chip (NoC) is to find NoC instances that balance the cost (e.g. area) and performance (e.g. latency and throughput). In this paper we present a simulation-based approach to address this problem. We use XML to instantiate network components (routers, network interfaces) and their composition. NoCs are evaluated in terms of cost and performance by sweeping over different parameters (e.g. network topology, network interface queue depth). We then show, how we can obtain trade-off plots by using the results obtained with our simulation environment. Finally, by means of two examples we illustrate how trade-off plots can help the NoC designers in selecting the right network based on a set of different constraints.

IEEE Design & Test of Computers | 2002

A heterogeneous multiprocessor architecture for flexible media processing

Martijn J. Rutten; J.T.J. van Eijndhoven; E.G.T. Jaspers; P. van der Wolf; Om Prakash Gangwal; A. Timmer; Evert-Jan D. Pol

Eclipse is a scalable architecture template for designing data-dependent stream-processing subsystems of media-processing SoCs. It combines application configuration flexibility with the efficiency of function-specific coprocessors that concurrently execute the tasks of one or more applications.

international symposium on systems synthesis | 2001

A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems

Om Prakash Gangwal; Andre K. Nieuwland; Paul E. R. Lippens

This paper describes the implementation of a data-synchronization scheme that can be used in the functional description and hardware realization of algorithms for heterogeneous multiprocessor architectures. In this scheme, synchronization primitives are chosen such that they can be implemented efficiently in both hardware and software on distributed shared memory architectures, without the need for atomic semaphore instructions. The proposed solution is flexible as the configuration of the data synchronization is programmable even after a hardware realization. It is also scalable since it can be implemented without the need for central resources. We show with experiments that distributed implementations are needed for scalable and high performance systems-on-a-chip.

Dynamic and robust streaming In and between connected consumer-electronics devices | 2005

Building Predictable Systems on Chip: An Analysis of Guaranteed Communication in the Aethereal Network on Chip

Om Prakash Gangwal; Andrei Radulescu; Kgw Kees Goossens; Santiago González Pestana; Edwin Rijpkema

As the complexity of Systems-on-Chip (SoC) is growing, meeting real-time requirements is becoming increasingly difficult. Predictability for computation, memory and communication components is needed to build real-time SoC. We focus on a predictable communication infrastructure called the AEthereal Network-on-Chip (NoC). The AEthereal NoC is a scalable communication infrastructure based on routers and network interfaces (NI). It provides two services: guaranteed throughput and latency (GT), and best effort (BE). Using the GT service, one can derive guaranteed bounds on latency and throughput. To achieve guaranteed throughput, buffers in NI must be dimensioned to hide round-trip latency and rate difference between computation and communication IPs (Intellectual Property). With the BE service, throughput and latency bounds cannot be derived with guarantees. In this chapter, we describe an analytical method to compute latency, throughput and buffering requirements for the AEthereal NoC. We show the usefulness of the method by applying it on an MPEG-2 (Moving Picture Experts Group) codec example.

international conference on vlsi design | 2003

Design of a 2D DCT/IDCT application specific VLIW processor supporting scaled and sub-sampled blocks

R. Krishnan; Om Prakash Gangwal; J.T.J. van Eijndhoven; A. Kumar

We present an innovative design of an accurate, 2D DCT IDCT processor, which handles scaled and sub-sampled input blocks efficiently. In the IDCT mode, the latency of the processor scales with the size of the input blocks varying from 7 cycles for an 1/spl times/1 block to 38 cycles for an 8 /spl times/ 8 block. This scalability is possible because the processor has input data dependant control by which it can exploit the reduced computational needs of sub-sampled blocks and blocks of smaller sizes to work in lesser cycles. This is a very useful feature for MPEG and HDTV decoders and has hitherto not been exploited. Clocking at 150 Mhz, the processor satisfies the high sample rate requirement of dual MPEG stream HD decoding with a picture size of 1920 /spl times/ 1080 at 30 frames per second. Fixed word length and accuracy simulations of our design shows that it conforms to the accuracy specifications of the CCITT standard within a 16 bit data path. A methodology based on architecture level synthesis is used to design the VLIW processor core. The VLIW design exploits the Instruction Level Parallelism present in the DCT/IDCT application, efficiently. The processor core is characterised by an area of 0.834 mm sq. and a frequency of 150 Mhz in 0.18 micron CMOS technology.

digital systems design | 2003

Understanding video pixel processing applications for flexible implementations

Om Prakash Gangwal; Johan Janssen; Selliah Rathnam; Erwin B. Bellers; Marc Duranton

Media processing system-on-chips (SoCs) mainly consist of audio encoding/decoding (e.g. AC-3, MP3), video encoding/decoding (e.g. H263, MPEG-2) and video pixel processing functions (e.g. de-interlacing, noise reduction). Video pixel processing functions have very high computational demands, as they require a large amount of computations on large amount of data (note that the data are pixels of completely decoded pictures). In this paper, we focus on video pixel processing functions. Usually, these functions are implemented in dedicated hardware. However, flexibility (by means of programmability or reconfigurability) is needed to introduce the latest innovative algorithms, to allow differentiation of products, and to allow bug fixing after fabricating chips. It is impossible to fulfill the computational requirements of these functions by current programmable media processors. To achieve efficient implementations for flexible solutions, we will study, in this paper, the application characteristics of some representative video pixel processing functions. The characteristics considered are granularity of operations, amount and kind of data accesses and degree of parallelism present in these functions. We observe that from computational granularity point of view many functions can be expressed in terms of kernels e.g. Median3 (i.e. median of three values), finite impulse response (FIR) filters, table lookups (LUT) etc. that are coarser grain than ALU, Mult, MAC, etc. Regarding the kind of data accesses, we categorize these functions as regular, regular with some data rearrangement and irregular data access patterns. Furthermore, the degree of parallelism present in these functions is expressed in terms of data level parallelism (DLP) and instruction/operation level parallelism (ILP). We show with an example that these properties can be exploited to make specialized programmable processors.

embedded systems for real-time multimedia | 2004

Application design trajectory towards reusable coprocessors - MPEG case study

Martijn J. Rutten; Om Prakash Gangwal; J.T.J. van Eijndhoven; E.G.T. Jaspers; E.J. Pol

This work presents a structured application design trajectory to transform media-processing applications - modeled as Kahn process network - into a set of function-specific hardware units called coprocessors. The proposed design trajectory focuses on identifying hardware-implementable computation kernels that are common for a predetermined set of applications. The design trajectory is exercised in a case study that maps MPEG video decoding and encoding applications onto a set of coprocessors in a heterogeneous multiprocessor architecture. The resulting set of coprocessors can simultaneously perform both encoding and decoding functions for multiple MPEG-2 streams in an estimated 4 mm/sup 2/ (excluding memory) in 0.18 /spl mu/ technology.

Archive | 2005

Interconnect and Memory Organization in SOCs for Advanced Set-Top Boxes and TV

Kees Goossens; Om Prakash Gangwal; Jens Röover; A.P. Niranjan

• A submitted manuscript is the authors version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publishers website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

Explore More