Georgios Dimitriou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georgios Dimitriou is active.

Explore More

Publication

Featured researches published by Georgios Dimitriou.

panhellenic conference on informatics | 2005

A tool for calculating energy consumption in wireless sensor networks

Georgios Dimitriou; P. K. Kikiras; Georgios I. Stamoulis; I. N. Avaritsiotis

Energy and total useful lifetime are primary design concerns of fundamental importance, in a variety of real life applications, where the deployment of a Wireless Sensor Network is desired. In this paper the authors introduce AVAKIS, a tool for calculating the energy consumption of the various components of a sensor node. The proposed tool is an architectural level simulator, in which the system building blocks are described by a high level behavioral model. The methodology used in order to estimate power consumption is based on both the characteristics of the components, and on a number of user-defined initialization parameters.

parallel computing in electrical engineering | 2004

Loop Scheduling for Multithreaded Processors

Georgios Dimitriou; Constantine D. Polychronopoulos

The presence of multiple active threads on the same processor can mask latency by rapid context switching, but it can adversely affect performance due to competition for shared datapath resources. In this paper we present Macro Software Pipelining (MSWP), a loop scheduling technique for multithreaded processors, which is based on the loop distribution transformation for loop pipelining. MSWP constructs loop schedules by partitioning the loop body into tasks and assigning each task to a thread that executes all iterations for that particular task. MSWP is applied top-down on a hierarchical program representation, and utilizes thread-level speculation for maximal exploitation of parallelism. We tested MSWP on a multithreaded architectural model, Coral 2000, using synthetic and SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops with unpredictable branches, and a speedup of up to 25% on perl, a highly sequential SPEC95 integer benchmark.

international conference on modern circuits and systems technologies | 2017

Loop pipelining in high-level synthesis with CCC

Georgios Dimitriou; Michael Dossis; Georgios I. Stamoulis

High-level synthesis allows the use of high-level programming languages for hardware design. Traditional programming with the C and ADA languages can lead to efficient hardware description through recently developed high-level synthesis tools. Compilers play an important role in this process, since they can bridge differences between software programming and hardware design methodologies, thus making high-level synthesis tools better accepted by the scientific community. Furthermore, modern compiler optimizations can be employed in order to obtain optimal hardware descriptions. Loop transformations are often the focus of compiler optimizations, since they can result in significant performance improvement, for both software and hardware programming. In this paper, we discuss the implementation of loop pipelining in the front-end compiler of the CCC high-level synthesis tool, and in particular we present new optimization techniques that lead to a decreased number of states in the FSM-based output of CCC. We present several experiments conducted on the Livermore loops and the MPEG2 open-source code, which prove the claimed improvement.

Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference on | 2016

Source-Level Compiler Optimizations for High-Level Synthesis

Georgios Dimitriou; Georgios Chatzianastasiou; Apostolos Tsakyridis; Georgios I. Stamoulis; Michael Dossis

With high-level synthesis becoming the preferred method for hardware design, tools that operate on high-level programming languages and optimize hardware output are crucial for successful synthesis. In high-level synthesis, conventional programming language codes describe hardware behavior. Those codes are translated into RTL-level description by some appropriate tool. Common such tools that not only translate, but also optimize code, are programming language compilers. Compilers can make the transition from software to hardware smooth, allowing programmers to use their software skills on hardware programming, without any language compromises. Nonetheless, compilers also utilize optimization techniques to obtain a better output hardware description. In this paper, we discuss compiler issues for high-level synthesis, and present the results of several compiler transformations that can be implemented on our C language compiler front end of the CCC high-level synthesis tool. The results are taken from experiments conducted on the MPEG2 open-source codes, and prove the importance of such transformations in high-level synthesis.

panhellenic conference on informatics | 2015

Performance and power simulation of a functional-unit-network processor with simplescalar and wattch

Kleovoulos Kalaitzidis; Georgios Dimitriou; Georgios I. Stamoulis; Michael Dossis

Loop acceleration is a means to enhance performance of a single- or multiple-issue microprocessor core. A new edge-like processor architecture incorporates a loop accelerator directly in the out-of-order back end of the processor, forming an extended hypercube interconnected network of functional unit nodes. In this work, we have simulated a full processor pipeline of our architecture in a high-level language. In particular, we have extended the Simplescalar, a well-known processor simulator, to include our multifunctional-unit back-end design, and to support our special instructions for loop acceleration. Thus, instructions forming qualified loops are scheduled and dispatched only once for execution, remaining in the back end for all loop iterations, interchanging values in a data-flow fashion. We have also utilized the Wattch power estimation tool, which has been traditionally coupling Simplescalar to produce an estimation of power consumption during simulation, to show that our design results in significant power savings. Since loop instructions reside in the functional unit nodes during loop execution, all front end of the pipeline is turned off and the register file and the instruction cache are kept at low power at that time. Experiments conducted include simulating execution of small loop-based benchmarks from the Livermore loops, as well as longer real-life code taken from open-source mpeg video compression codes. All experiments exhibit the expected performance and power consumption improvements, verifying earlier performance measurements on the HDL model of the back end.

panhellenic conference on informatics | 2013

Rapid, low-power loop execution in a network of functional units

Athanassios Tziouvaras; Georgios Dimitriou

The need for high-performance computing and low-power operation has led to the emergence of new processor architectures, with most recent designs based on the combination of multiple cores and multiple threads per core. In our work, we are exploring an architecture of multiple instruction pipelines, which merge into a common back-end, formed as a network of functional units. We focus on the back-end in this paper, and in particular, on a rapid, low-power execution of loops, based on data flow. We dispatch the loop body instructions on the network of functional units only once, and we then let the loop execute in a dataflow manner, without any other instruction issue before loop completion. In this way, we do not only speed up the loop execution but we also save energy, since during the execution of the loop the whole front end of the pipeline is not used and can be turned off. We have simulated the functional unit network on microarchitecture level, running a number of Livermore loops. The results we obtained show that the proposed architecture can accelerate loop execution by up to N/k, for a network of N units and loop body size of N instructions, and an issue rate of k instructions per cycle.

2017 South Eastern European Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM) | 2017

Minimal-area loop pipelining for high-level synthesis with CCC

Georgios Dimitriou; Michael Dossis; Georgios I. Stamoulis

Increased complexity of computer hardware makes close to impossible to rely on hand-coding at the-level of HDLs for digital hardware design. High-level synthesis can be employed instead, in order to automatically obtain HDL codes from highlevel language functional descriptions. With high-level synthesis it becomes easier to design coprocessors, accelerators, and other special-purpose hardware. Nonetheless, compiler optimizations can improve efficiency of automatically generated hardware descriptions and make high-level synthesis to become the dominant technology to build more complicated hardware as well. Compilers, well known and explored software tools, can allow programmers to use their software skills on hardware programming, without any language compromises. Furthermore, compiler optimizations transform the input code, in order to produce a high-quality high-performance output hardware description. In this paper, we discuss compiler issues for high-level synthesis, and in particular, the incorporation of loop pipelining in the C language front end of the CCC high-level synthesis tool. We also present a novel pipelining technique that minimizes the area used for the pipeline prologue and epilogue. Results from experiments on the Livermore loops and Mpeg2 open-source codes validate our technique.

international conference on modern circuits and systems technologies | 2016

Compiler transformations in hardware synthesis of Mpeg2 codes

Georgios Chatzianastasiou; Apostolos Tsakyridis; Georgios Dimitriou; Georgios I. Stamoulis; Michael Dossis

High-level synthesis is the technique that translates high-level programming language programs into equivalent hardware descriptions. The use of conventional programming languages as input to high-level synthesis is challenging, due to the conceptual differences between software programs and hardware descriptions, but is nonetheless becoming the preferred input to high-level synthesis tools. Compilers play an important role in this process, since they can not only bridge such differences, thus making high-level synthesis tools better accepted by the scientific community, but they can also apply code transformations that target an optimized hardware output. In this paper, we discuss a number of transformations that can be implemented in the C language front end of the CCC high-level synthesis tool. We present experiments of such transformations conducted on the MPEG2 open-source code, which prove that compiler optimizations can have a significant positive impact in high-level synthesis tools.

panhellenic conference on informatics | 2015

Hardware synthesis of high-level C constructs

Michael Dossis; Georgios Dimitriou

In this paper, experiments with a useable C frontend for the CCC behavioural synthesis tools are presented and analysed. This tool combination is able to rapidly deliver provably-correct hardware implementations at the RTL level, from high-level, abstract, algorithmic executable specifications at the C program level. The used constructs are discussed and a number of experiments with the tool are outlined and evaluated. The contribution of the CCC tools are invaluable for implementing real-life applications in hardware involving models with complex control flow and rich in loops and arrays. The discussed experiments prove the tools useable.

panhellenic conference on informatics | 2005

Hardware support for multithreaded execution of loops with limited parallelism

Georgios Dimitriou; Constantine D. Polychronopoulos

Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.

Explore More