Srinivas Boppu
University of Erlangen-Nuremberg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Srinivas Boppu.
ACM Transactions on Design Automation of Electronic Systems | 2013
Vahid Lari; Shravan Muddasani; Srinivas Boppu; Frank Hannig; Moritz Schmid; Jürgen Teich
We present a self-adaptive hierarchical power management technique for massively parallel processor architectures, supporting a new resource-aware parallel computing paradigm called invasive computing. Here, an application can dynamically claim, execute, and release the resources in three phases: resource acquisition (invade), program loading/configuration and execution (infect), and release (retreat). Resource invasion is governed by dedicated decentralized hardware controllers, called invasion controllers (ictrls), which are integrated into each processing element (PE). Several invasion strategies for claiming linearly connected or rectangular regions of processing resources are implemented. The key idea is to exploit the decentralized resource management inherent to invasive computing for power savings by enabling applications themselves to control the power for processing resources and invasion controllers using a hierarchical power-gating approach. We propose analytical models for estimating various components of energy consumption for faster design space exploration and compare them with the results obtained from a cycle-accurate C++ simulator of the processor array. In order to find optimal design trade-offs, various parameters like (a) energy consumption, (b) hardware cost, and (c) timing overheads are compared for different sizes of power domains. Experimental results show significant energy savings (up to 73%) for selected characteristical algorithms and different resource utilizations. In addition, we demonstrate the accuracy of our proposed analytical model. Here, estimation errors less than 3.6% can be reported.
computing frontiers | 2013
Frank Hannig; Moritz Schmid; Vahid Lari; Srinivas Boppu; Jürgen Teich
As data locality is a key factor for the acceleration of loop programs on processor arrays, we propose a buffer architecture that can be configured at run-time to select between different schemes for memory access. In addition to traditional address-based memory banks, the buffer architecture can deliver data in a streaming manner to the processing elements of the array, which supports dense and sparse stencil operations. Moreover, to minimize data transfers to the buffers, the design contains an interlinked mode, which is especially targeted at 2-D kernel computations. The buffers can be used individually to achieve high data throughput by utilizing a maximum number of I/O channels to the array, or concatenated to provide higher storage capacity at a reduced amount of I/O channels.
application specific systems architectures and processors | 2013
Srinivas Boppu; Frank Hannig; Jürgen Teich
We present a novel design methodology for the mapping of nested loops onto programmable hardware accelerators. Key features of our approach are: (1) Design entry in form of a functional programming language and loop parallelization in the polyhedron model, (2) the underlying accelerator architectures consist of lightweight, tightly-coupled, and programmable processor arrays, which can exploit both loop-level parallelism and instruction-level parallelism, (3) support of zero-overhead looping not only for inner most loops but also for arbitrarily nested loops. We implemented the proposed methodology in a prototype design tool and evaluated selected benchmarks by comparing our code generator with the Trimaran compilation framework. As the results show, our approach can reduce the size of the generated processor codes up to 64 % while at the same time achieving a significant higher throughput.
signal processing systems | 2014
Srinivas Boppu; Frank Hannig; Jürgen Teich
In this paper, we consider programmable tightly-coupled processor arrays consisting of interconnected small light-weight VLIW cores, which can exploit both loop-level parallelism and instruction-level parallelism. These arrays are well suited for compute-intensive nested loop applications often providing a higher power and area efficiency compared with commercial off-the-shelf processors. They are ideal candidates for accelerating the computation of nested loop programs in future heterogeneous systems, where energy efficiency is one of the most important design goals for overall system-on-chip design. In this context, we present a novel design methodology for the mapping of nested loop programs onto such processor arrays. Key features of our approach are: (1) Design entry in form of a functional programming language and loop parallelization in the polyhedron model, (2) support of zero-overhead looping not only for innermost loops but also for arbitrarily nested loops. Processors of such arrays are often limited in instruction memory size to reduce the area and power consumption. Hence, (3) we present methods for code compaction and code generation, and integrated these methods into a design tool. Finally, (4) we evaluated selected benchmarks by comparing our code generator with the Trimaran and VEX compiler frameworks. As the results show, our approach can reduce the size of the generated processor codes up to 64 % (Trimaran) and 55 % (VEX) while at the same time achieving a significant higher throughput.
application specific systems architectures and processors | 2012
Vahid Lari; Shravan Muddasani; Srinivas Boppu; Frank Hannig; Jürgen Teich
In this paper, we present an ultra low power design for a class of massively parallel architectures, called tightly-coupled processor arrays.Here, the key idea is to exploit the benefits of a decentralized resource management as inherent to invasive computing for power saving.We propose concepts and studying different architecture trade-offs for hierarchical power management by temporarily shutting down regions of processors through power gating. Moreover, a) overall system chip energy consumption, b) hardware cost, and c) timing overheads are compared for different sizes of power domains.Experimental results show that up to 70\,\% of system energy consumption may be saved for selected characteristical algorithms and different resource utilizations.
reconfigurable computing and fpgas | 2011
Srinivas Boppu; Frank Hannig; Jürgen Teich; Roberto Perez-Andrade
Invasive computing is a novel computing paradigm, which allows us to allocate several resources at run-time. Tightly-coupled processor arrays are well suited for invasive computing. This paper proposes a methodology, to symbolically program a claimed array of computational resources. Using this methodology, a single configuration stream can be derived, which is sufficient to configure all the claimed resources (processing elements) irrespective of the number of resources claimed. The configuration stream is modified at run-time dynamically, depending on the number of processors claimed. Configuration memory requirements were estimated for our methodology. It requires constant memory size and is independent of the problem size compared to a traditional approach.
ACM Transactions in Embedded Computing Systems | 2014
Frank Hannig; Vahid Lari; Srinivas Boppu; Alexandru Tanase; Oliver Reiche
conference on design and architectures for signal and image processing | 2012
Shravan Muddasani; Srinivas Boppu; Frank Hannig; Boris Kuzmin; Vahid Lari; Jürgen Teich
Archive | 2015
Jürgen Teich; Srinivas Boppu; Frank Hannig; Vahid Lari
Advances in Radio Science | 2014
Elisabeth Glocker; Srinivas Boppu; Qingqing Chen; Ulf Schlichtmann; Jürgen Teich; Doris Schmitt-Landsiedel