Harry Sidiropoulos
National Technical University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Harry Sidiropoulos.
international parallel and distributed processing symposium | 2012
Harry Sidiropoulos; Kostas Siozios; Peter Figuli; Dimitrios Soudris; Michael Hübner
Partial reconfiguration is possible to deliver virtually unlimited hardware resources since it enables dynamic allocation and de-allocation of tasks onto a reconfigurable architecture, while the rest tasks continue to operate. However, in order to benefit from this flexibility, partial reconfiguration has to be appropriately applied. Among others, the placement of partial configuration data is a critical issue since it affects the fragmentation of hardware resources. In this paper we introduce a novel methodology for supporting partial reconfiguration with the usage of a Just-in-Time (JIT) Compilation framework. Experimental results with a number of benchmarks showed that the introduced solution performs application P&R 7.34× faster, as compared to the state-of-the-art tools, while it also leads to significant lower fragmentation of hardware resources.
Journal of Systems Architecture | 2013
Harry Sidiropoulos; Kostas Siozios; Dimitrios Soudris
This paper introduces a novel methodology for enabling fast yet accurate exploration of memory organizations onto FPGA devices. The proposed methodology is software supported by a new open-source tool framework, named NAROUTO. This framework is the only public available solution for performing architecture-level exploration, as well as application mapping onto FPGA devices with different memory organizations, under a variety of design criteria (e.g. delay improvement, power optimization, area savings, etc.). Experimental results with a number of industrial oriented kernels prove the efficiency of the proposed solution, as compared to similar approaches, since it provides better manipulation of memory blocks, leading to architectures with higher performance in terms of area, power and delay.
reconfigurable architectures workshop | 2013
Harry Sidiropoulos; Kostas Siozios; Peter Figuli; Dimitrios Soudris; Michael Hübner; Jürgen Becker
The execution runtime usually is a headache for designers performing application mapping onto reconfigurable architectures. In this article we propose a methodology, as well as the supporting toolset, targeting to provide fast application implementation onto reconfigurable architectures with the usage of a Just-In-Time (JIT) compilation framework. Experimental results prove the efficiency of the introduced framework, as we reduce the execution runtime compared to the state-of-the-art approach on average by 53.5×. Additionally, the derived solutions achieve higher operation frequencies by 1.17×, while they also exhibit significant lower fragmentation ratios of hardware resources.
Journal of Systems Architecture | 2014
Harry Sidiropoulos; Kostas Siozios; Dimitrios Soudris
The interconnection structures in FPGA devices increasingly contribute more to the delay, power consumption and area overhead. The demand for even higher clock frequencies makes this problem even more important. Three-dimensional (3-D) chip stacking is touted as the silver bullet technology that can keep Moores momentum and fuel the next wave of consumer electronics products. However, the benefits of such a new integration paradigm have not been sufficiently explored yet. In this paper, a novel 3-D architecture, as well as the software supporting tools for exploring and evaluating application implementation, are introduced. More specifically, by assigning to different layers logic and I/O resources, we achieve mentionable wire-length reduction. Experimental results prove the effectiveness of such a selection, since target architectures outperform the conventional 2-D FPGAs.
field-programmable logic and applications | 2013
Harry Sidiropoulos; Peter Figuli; Kostas Siozios; Dimitrios Soudris; Jürgen Becker
Field programmable Gate Arrays (FPGAs) promise a low power flexible alternative for implementing parallel applications. Compared to CPUs and GPUs, they suffer from slow development cycles due to the high complexity of application development and hardware incompatibilities. Towards this direction, we propose a platform-independent methodology and the supporting framework targeting efficient run-time application mapping onto FPGAs. Experimental results show that the introduced solution performs application placement and routing of multiple applications without any performance penalty as compared to state of art tools. Scalability of the framework was verified by mapping up to 73 applications per minute when it is executed on an 8 core system.
field-programmable logic and applications | 2011
Harry Sidiropoulos; Kostas Siozios; Dimitrios Soudris
The interconnection structures in FPGA devices increasingly contribute more to the delay, power consumption and area overhead. Three-dimensional (3-D) chip stacking is touted as the silver bullet technology that can keep Moores momentum and fuel the next wave of consumer electronics products. However, the benefits of such an integration technology have not been sufficiently explored yet. In this paper, we introduce a novel 3-D architecture, as well as the software supporting tools for exploring and evaluating application mapping onto 3-D FPGAs, where logic and I/O resources are assigned to different layers. Experimental results shown that such a 3-D architecture is suitable especially for communication intensive applications, since a device with two layers achieves delay reduction, as compared to conventional 2-D FPGAs up to 87% without any overhead in power dissipation.
ieee international conference on high performance computing, data, and analytics | 2017
George Chatzikonstantis; Diego Jiménez; Esteban Meneses; Christos Strydis; Harry Sidiropoulos; Dimitrios Soudris
Brain modeling has been presenting significant challenges to the world of high-performance computing (HPC) over the years. The field of computational neuroscience has been developing a demand for physiologically plausible neuron models, that feature increased complexity and thus, require greater computational power. We explore Intel’s newest generation of Xeon Phi computing platforms, named Knights Landing (KNL), as a way to match the need for processing power and as an upgrade over the previous generation of Xeon Phi models, the Knights Corner (KNC). Our neuron simulator of choice features a Hodgkin-Huxley-based (HH) model which has been ported on both generations of Xeon Phi platforms and aggressively draws on both platforms’ computational assets. The application uses the OpenMP interface for efficient parallelization and the Xeon Phi’s vectorization buffers for Single-Instruction Multiple Data (SIMD) processing. In this study we offer insight into the efficiency with which the application utilizes the assets of the two Xeon Phi generations and we evaluate the merits of utilizing the KNL over its predecessor. In our case, an out-of-the-box transition on Knights Landing, offers on average 2.4\(\times \) speed up while consuming 48% less energy than KNC.
Journal of Neural Engineering | 2017
Georgios Smaragdos; Georgios Chatzikonstantis; Rahul Kukreja; Harry Sidiropoulos; Dimitrios Rodopoulos; Ioannis Sourdis; Zaid Al-Ars; Christoforos Kachris; Dimitrios Soudris; Chris I. De Zeeuw; Christos Strydis
OBJECTIVE The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. APPROACH In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workloads performance characteristics. MAIN RESULTS The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. SIGNIFICANCE The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
applied reconfigurable computing | 2015
Peter Figuli; Carsten Tradowsky; Jose Martinez; Harry Sidiropoulos; Kostas Siozios; Holger Stenschke; Dimitrios Soudris; Jürgen Becker
Today, digital signal processing systems for applications like audio or video production are restricted as they do not exhaust the possibilities given by modern hardware. Reconfigurable hardware exploits a huge degree of parallelism and provides flexibility at an affordable energy budget, thus becoming a competitive alternative for high performance Digital Signal Processing (DSP) applications, previously dominated by general purpose processing cores and Application-Specific Integrated Circuits (ASICs). This paper describes the design and evaluation of a novel concept for adaptive signal processing on reconfigurable hardware by using an adaptive reverberation algorithm targeting real time streams. Novel solutions were adopted in several critical parts of the signal processing chain in order to achieve a high level of accuracy under real time constraints. Experimental results show the efficiency of the introduced implementation on a Virtex-7 FPGA, as we can provide reality accurate reverberation with ultra low latency of \(\sim 20.8\,\mu {}s\).
applied reconfigurable computing | 2015
Kostas Siozios; Peter Figuli; Harry Sidiropoulos; Carsten Tradowsky; Dionysios Diamantopoulos; Konstantinos Maragos; Shalina Percy Delicia; Dimitrios Soudris; Jürgen Becker
This paper presents an on-going collaboration project, named TEAChER for providing breakthrough knowledge to students and young researchers on reconfigurable computing and advanced digital systems. The project is intended to cover topics like architectures and capabilities of field-programmable gate arrays, languages for the specification, modeling, and synthesis of digital systems. Furthermore design methods, computer-aided design tools, reconfiguration techniques and practical applications are taught. The virtual laboratory enables the remote students to easily interact with a set of reconfigurable platforms in order to control experiments through the internet. By using the user-friendly interface, the remote user can change predefined system parameters and observe system response either in textual, or graphical format. In addition such a virtual laboratory includes a booking system, which enables remote users to conduct experiments in advance.