Prashant Agrawal
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Prashant Agrawal.
digital systems design | 2013
Matthias Hartmann; Praveen Raghavan; Liesbet Van der Perre; Prashant Agrawal; Wim Dehaene
Recently, multiple non-volatile emerging memories (NVMs) have been proposed and show promising properties to replace SRAM-based memories in future SoCs. However, these new emerging memories, such as STT-MRAM and ReRAM, provide new challenges for the processor design e.g. larger write latencies, higher power and lower endurance. In this paper, we propose a design method for memristor-based (ReRAM) memory architectures for embedded processors to address the effects caused by longer write latencies. We evaluate this method and present the design space for using ReRAM in the data memory of an wireless base band processor. We propose architectural solutions for concealing the slow write speed of ReRAM and show their trade-offs in terms of performance with respect to different write latencies. We show that for single benchmarks the performance penalty caused by the ReRAM write latency can be reduced to 7% for the complete wireless communication benchmark suite. Morevoer, for single benchmarks the performance penalty can be eliminated completely.
international conference on acoustics, speech, and signal processing | 2013
Namita Sharma; Tom Vander Aa; Prashant Agrawal; Praveen Raghavan; Preeti Ranjan Panda; Francky Catthoor
Optimizations related to memory accesses and data storage make a significant difference to the performance and energy of a wide range of data-intensive applications. Such strategies need to evolve with modern SoC and processor architectures, which lead to new optimization opportunities. In this paper, we focus on data memory optimization for LTE downlink receiver as this is a data- and computation-intensive part of the LTE application with tight energy and latency constraints. We study the data dependencies globally and conclude that by providing data samples from the antennas in interleaved form at the FFT input, we can achieve 7-15% reduction in memory access energy over an optimized implementation without any performance overhead.
digital systems design | 2012
Prashant Agrawal; Kanishk Sugand; Martin Palkovic; Praveen Raghavan; Liesbet Van der Perre; Francky Catthoor
With the advent of heterogeneous MPSoC platform architecture based implementations for the IEEE 802.11n PHY processing, system partitioning and assignment (P&A) have become a key challenge. In this paper we have analyzed the area and energy trade-offs across different P&A schemes for the 4×4 and the 2×2 MIMO 40MHz modes of 802.11n. We have considered the payload processing part of the inner-modem processing for 802.11n PHY. We also present a framework for systematically carrying out the P&A exploration. We show that by exploiting parallelism at different levels, the energy can be reduced with negligible area overheads, by about 40% and 15% for the 4×4 and 2×2 modes, respectively. We also show that the P&A schemes with fine-grained partitioning are more energy efficient for mapping both the modes together on the same platform.
design, automation, and test in europe | 2014
Namita Sharma; Preeti Ranjan Panda; Min Li; Prashant Agrawal; Francky Catthoor
QR Decomposition (QRD) is a typical matrix decomposition algorithm that shares many common features with other algorithms such as LU and Cholesky decomposition. The principle can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence, the overall implementation energy. With modern low power embedded processors evolving towards register files with wide memory interfaces and vector functional units (FUs), the data flow in matrix decomposition algorithms needs to be carefully devised to achieve energy efficient implementation. In this paper, we present an efficient data flow transformation strategy for the Givens Rotation based QRD that optimizes data memory accesses. We also explore different possible implementations for QRD of multiple matrices using the SIMD feature of the processor. With the proposed data flow transformation, a reduction of up to 36% is achieved in the overall energy over conventional QRD sequences.
design automation conference | 2013
Prashant Agrawal; Praveen Raghavan; Matthias Hartman; Namita Sharma; Liesbet Van der Perre; Francky Catthoor
We present a systematic methodology for exploring application partitioning and assignment together with platform architecture instantiation. Streaming applications with multiple runtime modes are considered. The platform architecture is based on a domain specific MPSoC architecture template. We show results using complete inner modem physical layer processing of wireless applications, WLAN and LTE. We show that the proposed methodology obtains up to 30% energy improvement in energy with negligible area overheads as compared to straight-forward mapping to one processor, while meeting performance constraints, for a multi-mode WLAN 11n system and single-mode LTE system.
signal processing systems | 2011
Adrian Krdu; Yann Y. L. Lebrun; Ubaid Ahmad; Sofie Pollin; Prashant Agrawal; Min Li
We present the first implementation of a distributed beam-forming algorithm for interference mitigation on an SDR baseband processor. Co-channel interference (CCI) is becoming a major source of impairments in wireless communications and distributed beamforming is a promising technique to mitigate its negative impact. However, such schemes are challenging to implement in practical scenarios due to their complexity and synchronization requirements. In this paper, we report on implementation of a suboptimal, yet efficient, beamforming scheme for CCI mitigation and present the complexity modeling and algorithm transformations for achieving numerically stability. We also present the fixed-point quantization and the proper mapping on a parallel programmable baseband architecture aimed for software-defined radio (SDR). We optimize this algorithm for a coarse grained reconfigurable array (CGRA) processor and evaluate it in the context of the LTE standard.
IEEE Embedded Systems Letters | 2014
Prashant Agrawal; Dragomir Milojevic; Praveen Raghavan; Francky Catthoor; Liesbet Van der Perre; Eric Beyne; Ravi Varadarajan
3D integration is being explored as a viable alternative to overcome limitations faced by mobile MPSoC platforms in traditional 2D designs. TSV based interdie connection is the most widely used approach currently. Although, TSV dimensions are scaling down, they still impose a restriction on the interdie connections density and the granularity at which 3D partitioning can be carried out. These limitations will aggravate in future scaled technologies. Alternatives such as Cu-Cu bonding need to be explored to achieve very fine-pitch and high density interdie connections. In this letter, we carry out a system architecture level comparison for a complex MPSoC platform instantiated for wireless PHY processing (WLAN and LTE). We compare 2D and 3D using: 1) TSVs with μ bump and RDL (F2B); and 2) Cu-Cu bonding (F2F). We show significant gains in 3D as compared to 2D. We also show that F2B and F2F have different system level architecture requirements, and that their impact on parameters at interconnect and system architecture level varies.
signal processing systems | 2011
Prashant Agrawal; Robert Fasthuber; Praveen Raghavan; T. Vander Aa; Ubaid Ahmad; L. Van der Perre; Francky Catthoor
With the advent of heterogeneous MPSoC (Multi-Processors System-on-Chip) implementations of wireless applications, system partitioning and mapping has become a key challenge. To achieve efficient designs, system partitioning should simultaneously consider application characteristics, architecture constraints and physical design costs. It is also important to analyze the impact of partitioning on the systems area, energy and performance, as early as possible in the design flow. In this paper, we analyze the impact of different partitioning schemes for lattice reduction based MIMO detector. We show the trade-offs due to different partitioning schemes on area, energy and data parallelization factor for a given performance target for different number of processors. We carry out analysis based on high level estimates derived from the application and a set of characterized datapath and memory primitives for a template based architecture.
ACM Transactions in Embedded Computing Systems | 2016
Namita Sharma; Preeti Ranjan Panda; Francky Catthoor; Min Li; Prashant Agrawal
QR decomposition (QRD), a matrix decomposition algorithm widely used in embedded application domain, can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence the overall implementation energy. With modern low-power embedded processors evolving toward register files with wide memory interfaces and vector functional units (FUs), data flow in these algorithms needs to be carefully devised to efficiently utilize the costly wide memory accesses and the vector FUs. In this article, we present an energy-efficient data flow transformation strategy for the Givens rotation--based QRD.
international interconnect technology conference | 2014
Prashant Agrawal; Dragomir Milojevic; Praveen Raghavan; Francky Catthoor; Liesbet Van der Perre; Eric Beyne; Ravi Varadarajan
3D stacked ICs (3D-SIC) are viable alternatives to overcome limitations faced by mobile MPSoC platforms in 2D designs. In this paper, we evaluate 2D-ICs and 3D-SICs (memory-on-logic) at system architecture level for a complex MPSoC platform instantiated for wireless PHY processing (WLAN, LTE). For a 10-core heterogeneous MPSoC instantiation, we compare its implementations as 2D-IC and 3D-SIC (based on Cu-Cu bonding), and for two different level-1 data memory organization and communication bus structure. We also analyse impact of system level choices (memory organization, communication structure) for both 2D and 3D interconnects.