Athanasios Milidonis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Athanasios Milidonis is active.

Explore More

Publication

Featured researches published by Athanasios Milidonis.

international conference on electronics circuits and systems | 2004

Efficient implementation of the keyed-hash message authentication code (HMAC) using the SHA-1 hash function

Harris E. Michail; Athanasios P. Kakarountas; Athanasios Milidonis; Costas E. Goutis

In this paper, an efficient implementation, in terms of performance, of the keyed-hash message authentication code (HMAC) using the SHA-1 hash function is presented. This mechanism is used for message authentication in combination with a shared secret key. The proposed hardware implementation can be synthesized easily for a variety of FPGA and ASIC technologies. Simulation results, using commercial tools, verified the efficiency of the HMAC implementation in terms of performance and throughput. Special care has been taken so that the proposed implementation does not introduce extra design complexity; while in-parallel functionality was kept to the required levels.

IEEE Transactions on Dependable and Secure Computing | 2009

A Top-Down Design Methodology for Ultrahigh-Performance Hashing Cores

Harris E. Michail; Athanasios P. Kakarountas; Athanasios Milidonis; Costas E. Goutis

Many cryptographic primitives that are used in cryptographic schemes and security protocols such as SET, PKI, IPSec, and VPNs utilize hash functions, which form a special family of cryptographic algorithms. Applications that use these security schemes are becoming very popular as time goes by and this means that some of these applications call for higher throughput either due to their rapid acceptance by the market or due to their nature. In this work, a new methodology is presented for achieving high operating frequency and throughput for the implementations of all widely used-and those expected to be used in the near future-hash functions such as MD-5, SHA-1, RIPEMD (all versions), SHA-256, SHA-384, SHA-512, and so forth. In the proposed methodology, five different techniques have been developed and combined with the finest way so as to achieve the maximum performance. Compared to conventional pipelined implementations of hash functions (in FPGAs), the proposed methodology can lead even to a 160 percent throughput increase.

The Journal of Supercomputing | 2006

High-Speed FPGA Implementation of Secure Hash Algorithm for IPSec and VPN Applications

Athanasios P. Kakarountas; Haralambos Michail; Athanasios Milidonis; Costas E. Goutis; George Theodoridis

Hash functions are special cryptographic algorithms, which are applied wherever message integrity and authentication are critical. Implementations of these functions are cryptographic primitives widely used in common cryptographic schemes and security protocols such as Internet Protocol Security (IPSec) and Virtual Private Network (VPN). In this paper, a novel FPGA implementation of the Secure Hash Algorithm 1 (SHA-1) is proposed. The proposed architecture exploits the benefits of pipeline and re-timing of execution through pre-computation of intermediate temporal values. Pipeline allows division of the calculation of the hash value in four discreet stages, corresponding to the four required rounds of the algorithm. Re-timing is based on the decomposition of the SHA-1 expression to separate information dependencies and independencies. This allows pre-computation of intermediate temporal values in parallel to the calculation of other independent values. Exploiting the information dependencies, the fundamental operational block of SHA-1 is modified so that maximum operation frequency is increased by 30% approximately with negligible area penalty compared to other academic and commercial implementations. The proposed SHA-1 hash function was prototyped and verified using a XILINX FPGA device. The implementation’s characteristics are compared to alternative implementations proposed by the academia and the industry, which are available in the international IP market. The proposed implementation achieved a throughput that exceeded 2,5 Gbps, which is the highest among all similar IP cores for the targeted XILINX technology.

Integration | 2005

A high-throughput, memory efficient architecture for computing the tile-based 2D discrete wavelet transform for the JPEG2000

Grigoris Dimitroulakos; Michalis D. Galanis; Athanasios Milidonis; Constantinos E. Goutis

In this paper, the design and implementation of an optimized hardware architecture in terms of speed and memory requirements for computing the tile-based 2D forward discrete wavelet transform for the JPEG2000 image compression standard, are described. The proposed architecture is based on a well-known architecture template for calculating the 2D forward discrete wavelet transform. This architecture is derived by replacing the filtering units by our previously published throughput-optimized ones and by developing a scheduling algorithm suited to the special features of our filtering units. The architecture exhibits high-performance characteristics due to the throughput-optimized filters. Also, the extra clock cycles required due to the tile-based version of the discrete wavelet transform are partially compensated by the proper scheduling of the filters. The developed scheduling algorithm results in reduced memory requirements compared with existing architectures.

design, automation, and test in europe | 2005

A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms

Michalis D. Galanis; Athanasios Milidonis; George Theodoridis; Dimitrios Soudris; Constantinos E. Goutis

In this paper, we propose a methodology for partitioning and mapping computational intensive applications in reconfigurable hardware blocks of different granularity. A generic hybrid reconfigurable architecture is considered so as the methodology can be applicable to a large number of heterogeneous reconfigurable platforms. The methodology mainly consists of two stages, the analysis and the mapping of the application onto fine and coarse-grain hardware resources. A prototype framework consisting of analysis, partitioning and mapping tools has been also developed. For the coarse-grain reconfigurable hardware, we use our previously developed high-performance coarse-grain datapath. In this work, the methodology is validated using two real-world applications, an OFDM transmitter and a JPEG encoder. In the case of the OFDM transmitter, a maximum clock cycle decrease of 82 % relative to the ones in an all fine-grain mapping solution is achieved. The corresponding performance improvement for the JPEG is 43 %.

international symposium on circuits and systems | 2005

A high-throughput and memory efficient 2D discrete wavelet transform hardware architecture for JPEG2000 standard

Grigoris Dimitroulakos; Michalis D. Galanis; Athanasios Milidonis; Costas E. Goutis

The design and implementation of an efficient hardware architecture in terms of speed and memory requirements for computing the tile-based two-dimensional forward discrete wavelet transform for the JPEG2000 still image compression standard, is described. This architecture is derived from a well-established architecture template for calculating the two-dimensional forward discrete wavelet transform. The filters of that template are replaced by our previously published throughput-optimized ones. A proper scheduling algorithm has been developed that is matched to the special features of our filtering units. Performance improvements are gained thanks to the throughput-optimized filters. Also, due to the developed scheduling algorithms, reduced memory requirements are achieved when compared with previously published architectures.

international symposium on circuits and systems | 2005

A methodology for partitioning DSP applications in hybrid reconfigurable systems

Michalis D. Galanis; Athanasios Milidonis; George Theodoridis; Dimitrios Soudris; Constantinos E. Goutis

In this paper, we describe an automated and formalized methodology for partitioning computational intensive applications between reconfigurable hardware blocks of different granularity. A hybrid granularity reconfigurable generic system architecture is considered for this methodology, so as the methodology is applicable to a large number of hybrid reconfigurable architectures. For evaluating the effectiveness of the partitioning methodology, a prototype framework has been developed. In the case of the coarse-grain reconfigurable fabric, we consider our developed high-performance coarse-grain data-path. In the experimental results, a maximum clock cycle decrease of 82% relative to the all fine-grain mapping solution is achieved and the overall timing constraints of the application are met.

international conference on electronics, circuits, and systems | 2005

Novel high throughput implementation of SHA-256 hash function through pre-computation technique

Harris E. Michail; Athanasios Milidonis; Athanasios P. Kakarountas; Constantinos E. Goutis

Hash functions are utilized in the security layer of every communication protocol and in signature authentication schemes for electronic transactions. As time passes more sophisticated applications-that invoke a security layer-arise and address to more users-clients. This means that all these applications demand for higher throughput. In this work a pre-computation technique has been developed for optimizing SHA-256 which has already started replacing both SHA-1 and MD-5. Comparing to conventional pipelined implementations of SHA-256 hash function the applied pre-computation technique leads to about 30% higher throughput with only an area penalty of approximately 9.5%.

Design Automation for Embedded Systems | 2005

A method for partitioning applications in hybrid reconfigurable architectures

Michalis D. Galanis; Athanasios Milidonis; George Theodoridis; Dimitrios Soudris; Costas E. Goutis

In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications.

Microprocessors and Microsystems | 2007

Automated framework for partitioning DSP applications in hybrid reconfigurable platforms

Michalis D. Galanis; Athanasios Milidonis; George Theodoridis; Dimitrios Soudris; Constantinos E. Goutis

In this paper, we present a software framework that implements a formalized methodology for partitioning Digital Signal Processing applications between reconfigurable hardware blocks of different granularity. A hybrid generic reconfigurable architecture is considered, so that the methodology is applicable to a large variety of hybrid reconfigurable systems. The developed framework is composed of analysis, partitioning, and mapping tools. Although, the framework is parametrical in respect to the mapping procedures for the fine and coarse-grain reconfigurable units, we provide specific mapping algorithms for these types of hardware. In this work, the methodology is validated using five real-world digital signal processing applications; an orthogonal frequency division multiplexing transmitter, a cavity detector, a video compression technique, a JPEG encoder, and a wavelet-based image compressor. The experiments report that an average clock cycles decrease of 60.7%, relative to an all fine-grain mapping solution, is achieved using the developed framework for the considered applications.

Explore More