Santhosh Kumar Rethinagiri

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Santhosh Kumar Rethinagiri is active.

Explore More

Publication

Featured researches published by Santhosh Kumar Rethinagiri.

power and timing modeling optimization and simulation | 2014

VPPET: Virtual platform power and energy estimation tool for heterogeneous MPSoC based FPGA platforms

Santhosh Kumar Rethinagiri; Oscar Palomar; Javier Arias Moreno; Osman S. Unsal; Adrián Cristal

Using low-power symmetric multi-cores on FPGAs are becoming ubiquitous in embedded computing. This is due to the emergence of power and energy as key design metrics, as important as performance. This leads to the requirement of powerful and reliable tools, which will be used for the Design Space Exploration (DSE) based on power and energy at an early stage of the design flow. In this paper, we propose a simulation based virtual platform power and energy estimation tool for heterogeneous Multiprocessor System-on-Chip (MPSoC) based platforms. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for different parts of the system. This is a one-time activity. In the second step, a simulation based virtual platform framework is developed to accurately grab the activities used in the related power models generated in the first step. The combination of the two steps leads to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool is automated and also scalable for exploring complex embedded multi-core architectures. The efficiency of the proposed tool is validated through multi-cores/processors designed around the FPGAs and extended to accommodate futuristic multi-processors/cores for a reliable energy based DSE. The obtained power/energy estimation results provide less than 4% of error for single core processor, 8% for dual-core processor and 9% for heterogeneous MPSoC based systems when compared to real board measurements.

Microprocessors and Microsystems | 2015

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy for data centers

Santhosh Kumar Rethinagiri; Oscar Palomar; Anita Sobe; Gulay Yalcin; Thomas Knauth; J. Rubén Titos Gil; Pablo Prieto; Adrian Cristal; Osman Unsal; Pascal Felber; Christof Fetzer; Dragomir Milojevic

Abstract Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On the one hand, designers use technology scaling as one of the methods to face the phenomenon called dark silicon (only segments of a chip function concurrently due to power restrictions). On the other hand, designers use extreme-scale systems such as teradevices to meet the performance needs of their applications which in turn increases the power consumption of the platform. In order to overcome these challenges, we need novel computing paradigms that address energy efficiency. One of the promising solutions is to incorporate parallel distributed methodologies at different abstraction levels. The FP7 project ParaDIME focuses on this objective to provide different distributed methodologies (software–hardware techniques) at different abstraction levels to attack the power-wall problem. In particular, the ParaDIME framework will utilize: circuit and architecture operation below safe voltage limits for drastic energy savings, specialized energy-aware computing accelerators, heterogeneous computing, energy-aware runtime, approximate computing and power-aware message passing. The major outcome of the project will be a noval processor architecture for a heterogeneous distributed system that utilizes future device characteristics, runtime and programming model for drastic energy savings of data centers. Wherever possible, ParaDIME will adopt multidisciplinary techniques, such as hardware support for message passing, runtime energy optimization utilizing new hardware energy performance counters, use of accelerators for error recovery from sub-safe voltage operation, and approximate computing through annotated code. Furthermore, we will establish and investigate the theoretical limits of energy savings at the device, circuit, architecture, runtime and programming model levels of the computing stack, as well as quantify the actual energy savings achieved by the ParaDIME approach for the complete computing stack with the real environment.

embedded systems for real time multimedia | 2014

System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms

Santhosh Kumar Rethinagiri; Oscar Palomar; Javier Arias Moreno; Gulay Yalcin; Osman S. Unsal; Adrián Cristal

Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level design is considered as a vital premise to explore design trade-offs for such devices in the early stage of the design flow. This paper proposes a novel system-level power/energy estimation methodology and optimization techniques for heterogeneous CPU-GPU based platforms. There are two parts involved in this methodology. First, we developed the power models by using functional parameters to set up generic power models for different parts of the platform. Second, we designed a simulation based system-level prototype using SystemC (JIT) and Cycle-Accurate simulators to accurately evaluate the activities used in the related power models. The combination of the two parts leads to a novel power estimation methodology at system-level, which gives a good trade-off between accuracy and speed. Moreover, leveraging our methodology, we introduce novel power optimization techniques such as inter-task DVFS and workload balancing at the system-level for CPU-GPU platforms. The efficiency of our proposed methodology and optimization techniques are validated through a CARMA kit, which consists of an ARM quad-core processor and a NVIDIA GPU processor (96 cores). Estimated power and energy values are compared to real board measurements. Our obtained power/energy estimation results provide less than 2.5% of error for single core processor, 4% for dual-core processor, 4% for quad-core, 4% for GPU and 6% multi-processor based systems. By using the proposed optimization techniques, we achieved significant power and energy savings of up to 45% and 70% respectively for various industrial benchmarks.

digital systems design | 2014

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy

Santhosh Kumar Rethinagiri; Oscar Palomar; Anita Sobe; Thomas Knauth; Wojciech M. Barczynski; Gulay Yalcin; Yarco Hayduk; Adrian Cristal; Osman Unsal; Pascal Felber; Christof Fetzer; Julien Ryckaert; Gina Alioto

Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On one hand, designers use technology scaling as one of the methods to face the phenomenon called dark silicon (only segments of a chip function concurrently due to power restrictions). On the other hand, designers use extreme-scale systems such as teradevices to meet the performance needs of their applications which in turn increases the power consumption of the platform. In order to overcome these challenges, we need novel computing paradigms that address energy efficiency. One of the promising solutions is to incorporate parallel distributed methodologies at different abstraction levels. The FP7 project ParaDIME focuses on this objective to provide different distributed methodologies (software-hardware techniques) at different abstraction levels to attack the power-wall problem. In particular, the ParaDIME framework will utilize: circuit and architecture operation below safe voltage limits for drastic energy savings, specialized energy-aware computing accelerators, heterogeneous computing, energy-aware runtime, approximate computing and power-aware message passing. The major outcome of the project will be a processor architecture for a heterogeneous distributed system that utilizes future device characteristics for drastic energy savings. Wherever possible, ParaDIME will adopt multidisciplinary techniques, such as hardware support for message passing, runtime energy optimization utilizing new hardware energy performance counters, use of accelerators for error recovery from sub-safe voltage operation, and approximate computing through annotated code. Furthermore, we will establish and investigate the theoretical limits of energy savings at the device, circuit, architecture, runtime and programming model levels of the computing stack, as well as quantify the actual energy savings achieved by the ParaDIME approach for the complete computing stack with the real environment.

parallel, distributed and network-based processing | 2016

Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm

Gulay Yalcin; Santhosh Kumar Rethinagiri; Oscar Palomar; Osman Unsal; Adrian Cristal; Dragomir Milojevic

Voltage and frequency downscaling is a well-known scheme in order to reduce the energy consumption of a computer system. However, the quantity of the saved energy first depends on the utilized technology node. Also, when the voltage level is below the safe margin, instructions need to be re-executed due to voltage related faults which can present additional energy overheads thus nullifying the expected energy gains from the lower voltage. Moreover, both fault recovery and frequency reduction impacts the performance of the application. In this study, we first evaluate the error rate of several sub-circuits (i.e. functional units) at the n10 future technology node. In order to reduce the performance impact, we reduce the voltage and frequency of each sub-circuit at a fine granularity while we keep the frequency of the rest of the system in the nominal voltage level. In this way, in an out-of-order architecture instruction level parallelism can mask the performance impact of a relatively slow functional unit. According to our evaluations, the energy consumption of functional units can be reduced up to 92% with only 8% performance degradation.

design, automation, and test in europe | 2016

Energy minimization at all layers of the data center: The ParaDIME project

Oscar Palomar; Santhosh Kumar Rethinagiri; Gulay Yalcin; Ruben Titos-Gil; Pablo Prieto; Emma Torrella; Osman Unsal; Adrian Cristal; Pascal Felber; Anita Sobe; Yaroslav Hayduk; Mascha Kurpicz; Christof Fetzer; Thomas Knauth; Malte Schneegass; Jens Struckmeier; Dragomir Milojevic

The main objective of the ParaDIME project has been to minimize energy consumption at all levels of the data center. On the one hand, we have considered what can be achieved on currently existing systems, via improvements of the programming model and the runtime system. On the other hand, we investigated which techniques and design decisions can make future computing nodes more energy efficient. We have successfully proposed and developed several methodologies that enable up to 60% energy savings in data centers components.

field programmable gate arrays | 2014

APMC: advanced pattern based memory controller (abstract only)

Tassadaq Hussain; Oscar Palomar; Osman S. Unsal; Adrián Cristal; Eduard Ayguadé; Mateo Valero; Santhosh Kumar Rethinagiri

In this paper, we present APMC, the Advanced Pattern based Memory Controller, that uses descriptors to support both regular and irregular memory access patterns without using a master core. It keeps pattern descriptors in memory and prefetches the complex 1D/2D/3D data structure into its special scratchpad memory. Support for irregular Memory accesses are arranged in the pattern descriptors at program-time and APMC manages multiple patterns at run-time to reduce access latency. The proposed APMC system reduces the limitations faced by processors/accelerators due to irregular memory access patterns and low memory bandwidth. It gathers multiple memory read/write requests and maximizes the reuse of opened SDRAM banks to decrease the overhead of opening and closing rows. APMC manages data movement between main memory and the specialized scratchpad memory; data present in the specialized scratchpad is reused and/or updated when accessed by several patterns. The system is implemented and tested on a Xilinx ML505 FPGA board. The performance of the system is compared with a processor with a high performance memory controller. The results show that the APMC system transfers regular and irregular datasets up to 20.4x and 3.4x faster respectively than the baseline system. When compared to the baseline system, APMC consumes 17% less hardware resources, 32% less on-chip power and achieves between 3.5x to 52x and 1.4x to 2.9x of speedup for regular and irregular applications respectively. The APMC core consumes 50% less hardware resources than the baseline systems memory controller. In this paper, we present APMC, the Advanced Pattern based Memory Controller, an intelligent memory controller that uses descriptors to supports both regular and irregular memory access patterns. support of the master core. It keeps pattern descriptors in memory and prefetches the complex data structure into its special scratchpad memory. Memory accesses are arranged in the pattern descriptors at program-time and APMC manages multiple patterns at run-time to reduce access latency. The proposed APMC system reduces the limitations faced by processors/accelerators due to irregular memory access patterns and low memory bandwidth. The system is implemented and tested on a Xilinx ML505 FPGA board. The performance of the system is compared with a processor with a high performance memory controller. The results show that the APMC system transfers regular and irregular datasets up to 20.4x and 3.4x faster respectively than the baseline system. When compared to the baseline system, APMC consumes 17% less hardware resources, 32% less on-chip power and achieves between 3.5x to 52x and 1.4x to 2.9x of speedup for regular and irregular applications respectively. The APMC core consumes 50% less hardware resources than the baseline systems memory controller.memory accesses. In this paper, we present APMC, the Advanced Pattern based Memory Controller, an intelligent memory controller that supports both regular and irregular memory access patterns. The proposed APMC system reduces the limitations faced by processors/accelerators due to irregular memory access patterns and low memory bandwidth. The system is implemented and tested on a Xilinx ML505 FPGA board. The performance of the system is compared with a processor with a high performance memory controller. The results show that the APMC system transfers regular and irregular datasets up to 20.4x and 3.4x faster respectively than the baseline system. When compared to the baseline system, APMC consumes 17% less hardware resources, 32% less on-chip power and achieves between 3.5x to 52x and 1.4x to 2.9x of speedup for regular and irregular applications respectively.

PROGram for Research on Embedded Systems & Software (PROGRESS) workshop | 2010