Murali Jayapala | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Murali Jayapala is active.

Explore More

Publication

Featured researches published by Murali Jayapala.

IEEE Transactions on Computers | 2005

Clustered loop buffer organization for low energy VLIW embedded processors

Murali Jayapala; Francisco Barat; T van der Aa; F. Catthoor; Henk Corporaal; Geert Deconinck

Current loop buffer organizations for very large instruction word processors are essentially centralized. As a consequence, they are energy inefficient and their scalability is limited. To alleviate this problem, we propose a clustered loop buffer organization, where the loop buffers are partitioned and functional units are logically grouped to form clusters, along with two schemes for buffer control, which regulate the activity in each cluster. Furthermore, we propose a design-time scheme to generate clusters by analyzing an application profile and grouping closely related functional units. The simulation results indicate that the energy consumed in the clustered loop buffers is, on average, 63 percent lower than the energy consumed in an uncompressed centralized loop buffer scheme, 35 percent lower than a centralized compressed loop buffer scheme, and 22 percent lower than a randomly clustered loop buffer scheme.

field-programmable logic and applications | 2003

Low Power Coarse-Grained Reconfigurable Instruction Set Processor

Francisco Barat; Murali Jayapala; Tom Vander Aa; Rudy Lauwereins; Geert Deconinck; Henk Corporaal

Current embedded multimedia applications have stringent time and power constraints. Coarse-grained reconfigurable processors have been shown to achieve the required performance. However, there is not much research regarding the power consumption of such processors. In this paper, we present a novel coarse-grained reconfigurable processor and study its power consumption using a power model derived from Wattch. Several processor configurations are evaluated using a set of multimedia applications. Results show that the presented coarse-grained processor can achieve on average 2.5x the performance of a RISC processor with an 18% increase in energy consumption.

field programmable logic and applications | 2001

CRISP: A Template for Reconfigurable Instruction Set Processors

Pieter Op De Beeck; Francisco Barat; Murali Jayapala; Rudy Lauwereins

A template for reconfigurable instruction set processors is described. This template defines a design space that enables the exploration of processors potentially suitable for flexible, power and cost efficient implementations of embedded multimedia applications, such as video compression in a hand held device. The template is based on a VLIW processor with a reconfigurable instruction set. In the future this template will be used for design space exploration, compiler retargeting and automatic hardware synthesis. Several existing reconfigurable- and non-reconfigurable processors were mapped onto the template to assess its expressiveness.

design, automation, and test in europe | 2006

Distributed Loop Controller Architecture for Multi-threading in Uni-threaded VLIW Processors

Praveen Raghavan; Andy Lambrechts; Murali Jayapala; Francky Catthoor; Diederik Verkest

Reduced energy consumption is one of the most important design goals for embedded application domains like wireless, multimedia and biomedical. Instruction memory hierarchy has been proven to be one of the most power hungry parts of the system. This paper introduces an architectural enhancement for the instruction memory to reduce energy and improve performance. The proposed distributed instruction memory organization requires minimal hardware overhead and allows execution of multiple loops in parallel in a uni-processor system. This architecture enhancement can reduce the energy consumed in the instruction and data memory hierarchy by 70.01 % and improve the performance by 32.89% compared to enhanced SMT based architectures

signal processing systems | 2008

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Guillermo Talavera; Murali Jayapala; Jordi Carrabina; Francky Catthoor

Nowadays embedded systems are growing at an impressive rate and provide more and more sophisticated applications characterized by having a complex array index manipulation and a large number of data accesses. Those applications require high performance specific computation that general purpose processors can not deliver at a reasonable energy consumption. Very long instruction word architectures seem a good solution providing enough computational performance at low power with the required programmability to speed up the time to market. Those architectures rely on compiler effort to exploit the available instruction and data parallelism to keep the data path busy all the time. With the density of transistors doubling each 18 months, more and more sophisticated architectures with a high number of computational resources running in parallel are emerging. With this increasing parallel computation, the access to data is becoming the main bottleneck that limits the available parallelism. To alleviate this problem, in current embedded architectures, a special unit works in parallel with the main computing elements to ensure efficient feed and storage of the data: the address generator unit, which comes in many flavors. Future architectures will have to deal with enormous memory bandwidth in distributed memories and the development of address generators units will be crucial for effective next generation of embedded processors where global trade-offs between reaction-time, bandwidth, energy and area must be achieved. This paper provides a survey of methods and techniques that optimize the address generation process for embedded systems, explaining current research trends and needs for future.

application-specific systems, architectures, and processors | 2005

Power breakdown analysis for a heterogeneous NoC platform running a video application

Andy Lambrechts; Praveen Raghavan; Anthony Leroy; Guillermo Talavera; Tom Vander Aa; Murali Jayapala; Francky Catthoor; Diederik Verkest; Geert Deconinck; Henk Corporaal; Frédéric Robert; Jordi Carrabina

Users expect future handheld devices to provide extended multimedia functionality and have long battery life. This type of application imposes heavy constraints on performance and power consumption and forces designers to optimize all parts of their platform. Evaluating the overall platform power breakdown is therefore critical to determine where to spend the efforts on power optimization. Surprisingly, few studies exist on that topic and decisions generally rely on common belief. We have realized a complete power breakdown for a realistic platform to identify the major power bottlenecks. This paper presents this power assessment of a realistic heterogeneous network on chip platform including processors, network and data/instruction memory hierarchy, running a video processing chain from camera to display. Our power breakdown identifies the main bottlenecks in the memory hierarchy and the foreground memory, and shows that global interconnect is not that critical for a well-optimized application mapping.

design, automation, and test in europe | 2007

Very wide register: an asymmetric register file organization for low power embedded processors

Praveen Raghavan; Andy Lambrechts; Murali Jayapala; Francky Catthoor; Diederik Verkest; Henk Corporaal

In current embedded systems processors, multi-ported register files are one of the most power hungry parts of the processor, even when they are clustered. This paper presents a novel register file architecture, which has single ported cells and asymmetric interfaces to the memory and to the datapath. Several realistic kernels from the TI DSP benchmark and from software defined radio (SDR) are mapped on the architecture. A complete physical design of the architecture is done in TSMC 90nm technology. The novel architecture presented is shown to obtain energy gains of up to 10times with respect to conventional multi-ported register file over the different benchmarks

asia and south pacific design automation conference | 2004

Instruction buffering exploration for low energy VLIWs with instruction clusters

Tom Vander Aa; Murali Jayapala; Francisco Barat; Geert Deconinck; Rudy Lauwereins; Francky Catthoor; Henk Corporaal

For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are energy efficient. However current compilers for VLIW do not fully exploit the potentials offered by such a clustered organization This paper presents an algorithm to explore what is the optimal loop buffer configuration and the optimal way to use this configuration for an application or a set of applications. Results for the MediaBench application suite show an additional 18% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional non-clustered approaches to the loop buffer without compromising performance.

international conference on vlsi design | 2008

Energy-Aware Interconnect Optimization for a Coarse Grained Reconfigurable Processor

Andy Lambrechts; Praveen Raghavan; Murali Jayapala; Francky Catthoor; Diederik Verkest

Modern portable embedded devices provide continuously more features and need processors that are of increasingly higher performance in order to sustain very demanding multimedia and wireless applications. Larger amounts of flexibility need to be built in and the same processor needs to be used for a wide range of evolving products, while very strict energy constraints need to be met in order to provide a long battery life. Coarse Grained Reconflgurable Architectures (CGRAs) provide a mix of flexible computational resources and large amounts of programmable interconnect. However, this programmable interconnect is on average consuming about 50% of the cores energy consumpion for state of the art interconnection topologies. In this work we present an optimized interconnection implementation that selectively activates only the connections that are being used in a certain cycle, in order to reduce the energy spent in the interconnect. Using this optimization, we show the effect on the energy and performance trade-off for the ADRES CGRA. The energy cost of the optimized interconnect topologies that provide a higher performance can be reduced significantly, reducing the total energy consumption of the core with up to 40%. This will enable designers to develop more efficient architectures, tuned to a targeted application domain.

international electron devices meeting | 2014

A CMOS-compatible, integrated approach to hyper- and multispectral imaging

Andy Lambrechts; Pilar Gonzalez; Bert Geelen; Philippe Soussan; Klaas Tack; Murali Jayapala

Imec has developed a unique hyperspectral sensor concept in which the spectral unit is monolithically integrated on top of a standard CMOS sensor at wafer level, hence enabling the design of compact, low cost and high speed spectral cameras with a high design flexibility. This paper presents the various demonstrated prototype sensors, with different filter arrangements and performance, linked to different usage modes and application domains. It also reviews the key aspects and challenges of imecs hyperspectral technology.

Explore More