Michalis D. Galanis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michalis D. Galanis is active.

Explore More

Publication

Featured researches published by Michalis D. Galanis.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

A high-performance data path for synthesizing DSP kernels

Michalis D. Galanis; George Theodoridis; Spyros Tragoudas; Constantinos E. Goutis

A high-performance data path to implement digital signal processing (DSP) kernels is introduced in this paper. The data path is realized by a flexible computational component (FCC), which is a pure combinational circuit and it can implement any 2 times 2 template (cluster) of primitive resources. Thus, the data paths performance benefits from the intracomponent chaining of operations. Due to the flexible structure of the FCC, the data path is implemented by a small number of such components. This allows for direct connections among FCCs and for exploiting intercomponent chaining, which further improves performance. Due to the universality and flexibility of the FCC, simple and efficient algorithms perform scheduling and binding of the data flow graph (DFG). DSP benchmarks synthesized with the FCC data path method show significant performance improvements when compared with template-based data path designs. Detailed results on execution time, FCC utilization, and area are presented

international conference on electronics circuits and systems | 2004

Comparison of the hardware architectures and FPGA implementations of stream ciphers

Michalis D. Galanis; Paris Kitsos; Giorgos Kostopoulos; Nicolas Sklavos; Odysseas G. Koufopavlou; Costas E. Goutis

In this paper, the hardware implementations of five representative stream ciphers are compared in terms of performance and consumed area. The ciphers used for the comparison are the A5/1, W7, E0, RC4 and Helix. The first three ones have been used for the security part of well-known standards. The Helix cipher is a recently introduced fast, word oriented, stream cipher. The W7 algorithm has been recently proposed as a more trustworthy solution for GSM, due to the security problems that occurred concerning the A5/1 strength. The designs were coded using the VHDL language. For the hardware implementation of the designs, an FPGA device was used. The implementation results illustrate the hardware performance of each cipher in terms of throughput-to-area ratio. This ratio equals: 5.88 for the A5/1, 1.26 for the W7, 0.21 for the E0, 2.45 for the Helix and 0.86 for the RC4.

Microprocessors and Microsystems | 2009

Resource aware mapping on coarse grained reconfigurable arrays

Stavros Georgiopoulos; Michalis D. Galanis; Costas E. Goutis

Coarse grain reconfigurable array architectures have become increasingly popular due to their flexibility, scalability and performance. However, the mapping of programs on these architectures is characterized by huge complexity. This work presents a new mapping methodology for effectively mapping applications on coarse grained reconfigurable arrays. The core of this methodology comprises of the scheduling and register allocation phases performed, for the first time in the case of CGRAs, in a single step. Additionally, modulo scheduling with backtracking capability is incorporated in this scheme. The main contribution of this work includes a novel technique for minimizing the memory bandwidth bottleneck, a new priority scheme and a new set of heuristics which target on the maximization of the Instruction Level Parallelism by efficiently managing the architectures resources. The overall approach is retargetable with respect to a parametric architecture template modelling a large number of architecture alternatives and it has been automated with a prototype tool which permits experimental exploration. The experimental results showed that the achieved performance figures are very close to the most effective ones derived from the theoretical study on the architectures resources and the applications requirements. Moreover, the application of the bandwidth optimization technique lead to a 20-130% increase on operation parallelism. Finally, the experiments quantified the benefit from applying the new priority scheme and heuristics.

Computers & Electrical Engineering | 2004

64-bit Block ciphers: hardware implementations and comparison analysis

Paris Kitsos; Nicolas Sklavos; Michalis D. Galanis; Odysseas G. Koufopavlou

A performance comparison for the 64-bit block cipher (Triple-DES, IDEA, CAST-128, MISTY1, and KHAZAD) FPGA hardware implementations is given in this paper. All these ciphers are under consideration from the ISO/IEC 18033-3 standard in order to provide an international encryption standard for the 64-bit block ciphers. Two basic architectures are implemented for each cipher. For the non-feedback cipher modes, the pipelined technique between the rounds is used, and the achieved throughput ranges from 3.0Gbps for IDEA to 6.9Gbps for Triple-DES. For feedback ciphers modes, the basic iterative architecture is considered and the achieved throughput ranges from 115Mbps for Triple-DES to 462Mbps for KHAZAD. The throughput, throughput per slice, latency, and area requirement results are provided for all the ciphers implementations. Our study is an effort to determine the most suitable algorithm for hardware implementation with FPGA devices.

field-programmable custom computing machines | 2005

Accelerating applications by mapping critical kernels on coarse-grain reconfigurable hardware in hybrid systems

Michalis D. Galanis; Grigoris Dimitroulakos; Costas E. Goutis

In this paper, we propose a method for speeding-up applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on the coarse-grain reconfigurable hardware. The partitioning method consists of four steps; the intermediate representation creation, the kernel identification, the mapping onto coarse-grain reconfigurable blocks, and the mapping onto the FPGA hardware. The method is validated using five real-world applications, where the speedup relative to an all-FPGA solution ranges from 1.4 to 3.1.

international symposium on circuits and systems | 2004

High-speed hardware implementations of the KASUMI block cipher

Paris Kitsos; Michalis D. Galanis; Odysseas G. Koufopavlou

KASUMI block cipher is used for the security part of many synchronous wireless standards. In this paper two architectures and efficient implementations of the 64-bit KASUMI block cipher are presented. In the first one, the pipeline technique (inner-round and outer-round pipeline) is used and throughput value equal to 3584 Mbps at 56 MHz is achieved. The second one uses feedback logic and reaches a throughput value equal to 432 Mbps at 54 MHz. The designs were coded using VHDL language and for the hardware implementations, a FPGA device was used. A detailed analysis, in terms of performance, and covered area is shown. The proposed implementations outperform any previous published KASUMI implementations in terms of performance.

The Journal of Supercomputing | 2006

Partitioning Methodology for Heterogeneous Reconfigurable Functional Units

Michalis D. Galanis; Grigoris Dimitroulakos; Costas E. Goutis

A partitioning methodology between the reconfigurable hardware blocks of different granularity, which are embedded in a generic heterogeneous architecture, is presented. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by a 2-Dimensional Array of Processing Elements. Critical parts, called kernels, are mapped on the coarse-grain reconfigurable logic for improving performance. The partitioning method is mainly composed by three steps: the analysis of the input code, the mapping onto the Coarse-Grain Reconfigurable Array and the mapping onto the FPGA. The partitioning flow is implemented by a prototype software framework. Analytical partitioning experiments, using five real-world applications, show that the execution time speedup relative to an all-FPGA solution ranges from 1.4 to 5.0.

Journal of Circuits, Systems, and Computers | 2005

A RECONFIGURABLE COARSE-GRAIN DATA-PATH FOR ACCELERATING COMPUTATIONAL INTENSIVE KERNELS

Michalis D. Galanis; George Theodoridis; Spyros Tragoudas; Constantinos E. Goutis

In this paper, a high-performance reconfigurable coarse-grain data-path, part of a hybrid reconfigurable platform, is introduced. The data-path consists of coarse-grain components that their flexibility and universality is shown to increase the systems performance due to significant reductions in latency. A methodology of unsophisticated but efficient algorithms for mapping computational intensive applications on the proposed data-path is also presented. Results on Digital Signal Processing and multimedia benchmarks show an average execution cycles reduction of 20%, combined with an area consumption decrease, when the proposed data-path is compared with a high-performance one. The average cycles reduction is even greater, 44%, when the comparison is held with a data-path that instantiates primitive computational resources on FPGA hardware.

Integration | 2005

A high-throughput, memory efficient architecture for computing the tile-based 2D discrete wavelet transform for the JPEG2000

Grigoris Dimitroulakos; Michalis D. Galanis; Athanasios Milidonis; Constantinos E. Goutis

In this paper, the design and implementation of an optimized hardware architecture in terms of speed and memory requirements for computing the tile-based 2D forward discrete wavelet transform for the JPEG2000 image compression standard, are described. The proposed architecture is based on a well-known architecture template for calculating the 2D forward discrete wavelet transform. This architecture is derived by replacing the filtering units by our previously published throughput-optimized ones and by developing a scheduling algorithm suited to the special features of our filtering units. The architecture exhibits high-performance characteristics due to the throughput-optimized filters. Also, the extra clock cycles required due to the tile-based version of the discrete wavelet transform are partially compensated by the proper scheduling of the filters. The developed scheduling algorithm results in reduced memory requirements compared with existing architectures.

Journal of Circuits, Systems, and Computers | 2006

ARCHITECTURES AND FPGA IMPLEMENTATIONS OF THE 64-BIT MISTY1 BLOCK CIPHER

Paris Kitsos; Michalis D. Galanis; Odysseas G. Koufopavlou

In this paper, we present two alternative architectures and FPGA implementations of the 64-bit NESSIE proposal, MISTY1 block cipher. The first architecture is suitable for applications with high-performance requirements. A throughput of up to 12.6 Gbps can be achieved at a clock frequency of 168 MHz. The main characteristic of this architecture is that uses RAM blocks embedded in modern FPGA devices in order to implement the S-boxes defined in the block cipher algorithm. The second architecture can be used in implementing applications on area-constrained systems. It utilizes feedback logic and inner pipeline with negative edge-triggered register. This technique shortens the critical path, without increasing the latency of the MISTY1 algorithm execution. Compared with an implementation without inner pipeline, performance improvement of 97% is achieved. The measured throughput of the second architecture implementation is 561 Mbps at 79 MHz.

Explore More