George Theodoridis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George Theodoridis is active.

Explore More

Publication

Featured researches published by George Theodoridis.

Microelectronics Journal | 2003

An efficient reconfigurable multiplier architecture for Galois field GF(2m)

Paris Kitsos; George Theodoridis; Odysseas G. Koufopavlou

This paper describes an efficient architecture of a reconfigurable bit-serial polynomial basis multiplier for Galois field GFð2 m Þ; where 1 , m # M: The value m; of the irreducible polynomial degree, can be changed and so, can be configured and programmed. The value of M determines the maximum size that the multiplier can support. The advantages of the proposed architecture are (i) the high order of flexibility, which allows an easy configuration for different field sizes, and (ii) the low hardware complexity, which results in small area. By using the gated clock technique, significant reduction of the total multiplier power consumption is achieved. q 2003 Elsevier Ltd. All rights reserved.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

A high-performance data path for synthesizing DSP kernels

Michalis D. Galanis; George Theodoridis; Spyros Tragoudas; Constantinos E. Goutis

A high-performance data path to implement digital signal processing (DSP) kernels is introduced in this paper. The data path is realized by a flexible computational component (FCC), which is a pure combinational circuit and it can implement any 2 times 2 template (cluster) of primitive resources. Thus, the data paths performance benefits from the intracomponent chaining of operations. Due to the flexible structure of the FCC, the data path is implemented by a small number of such components. This allows for direct connections among FCCs and for exploiting intercomponent chaining, which further improves performance. Due to the universality and flexibility of the FCC, simple and efficient algorithms perform scheduling and binding of the data flow graph (DFG). DSP benchmarks synthesized with the FCC data path method show significant performance improvements when compared with template-based data path designs. Detailed results on execution time, FCC utilization, and area are presented

ACM Transactions on Reconfigurable Technology and Systems | 2012

On the exploitation of a high-throughput SHA-256 FPGA design for HMAC

Harris E. Michail; George S. Athanasiou; Vasileios I. Kelefouras; George Theodoridis; Costas E. Goutis

High-throughput and area-efficient designs of hash functions and corresponding mechanisms for Message Authentication Codes (MACs) are in high demand due to new security protocols that have arisen and call for security services in every transmitted data packet. For instance, IPv6 incorporates the IPSec protocol for secure data transmission. However, the IPSecs performance bottleneck is the HMAC mechanism which is responsible for authenticating the transmitted data. HMACs performance bottleneck in its turn is the underlying hash function. In this article a high-throughput and small-size SHA-256 hash function FPGA design and the corresponding HMAC FPGA design is presented. Advanced optimization techniques have been deployed leading to a SHA-256 hashing core which performs more than 30% better, compared to the next better design. This improvement is achieved both in terms of throughput as well as in terms of throughput/area cost factor. It is the first reported SHA-256 hashing core that exceeds 11Gbps (after place and route in Xilinx Virtex 6 board).

IEEE Transactions on Very Large Scale Integration Systems | 2009

The ARISE Approach for Extending Embedded Processors With Arbitrary Hardware Accelerators

Nikolaos Vassiliadis; George Theodoridis; Spiridon Nikolaidis

ARISE introduces a systematic approach for extending once an embedded processor to support thereafter the coupling of an arbitrary number of custom computing units (CCUs). A CCU can be a hardwired or a reconfigurable unit, which can be utilized following a tight and/or loose model of computation. By selecting the appropriate model of computation for each part of the application, the complete application space is considered for acceleration, resulting in significant performance improvements. Also, ARISE offers modularity and scalability and is not restricted by the opcode space and operands limitation problems that exist in such type of machines. To support these features we introduce a machine organization that allows the cooperation of a processor and a set of CCUs. To control the CCUs we extend once the instruction set of the processor with eight instructions. To efficiently incorporate these features to an embedded processor, we propose a micro-architecture implementation that minimizes the control and communication overhead between the processor and the CCUs. To evaluate our proposal, we extended a MIPS processor with the ARISE infrastructure and implemented it on a Xilinx field-programmable gate array (FPGA). Implementation results, demonstrate that the timing model of the processor is not affected. Also, we implemented a set of benchmarks on the ARISE evaluation machine. Performance results prove significant improvements and reduced communication overhead compared to a typical coprocessor approach.

applied reconfigurable computing | 2006

A RISC architecture extended by an efficient tightly coupled reconfigurable unit

Nikolaos Vassiliadis; Nikolaos Kavvadias; George Theodoridis; Spiridon Nikolaidis

In this paper, the architecture of an embedded processor extended with a tightly-coupled coarse-grain reconfigurable functional unit (RFU) is proposed. The efficient integration of the RFU with the control unit and the datapath of the processor eliminate the communication overhead between them. To speed up execution, the RFU exploits instruction level parallelism (ILP) and spatial computation. Also, the proposed integration of the RFU efficiently exploits the pipeline structure of the processor, leading to further performance improvements. Furthermore, a development framework for the introduced architecture is presented. The framework is fully automated, hiding all reconfigurable hardware related issues from the user. The hardware model of the architecture was synthesized in a 0.13 µm process and all information regarding area and delay were estimated and presented. A set of benchmarks is used to evaluate the architecture and the development framework. Experimental results prove performance improvements in addition to potential energy reduction.

The Journal of Supercomputing | 2006

High-Speed FPGA Implementation of Secure Hash Algorithm for IPSec and VPN Applications

Athanasios P. Kakarountas; Haralambos Michail; Athanasios Milidonis; Costas E. Goutis; George Theodoridis

Hash functions are special cryptographic algorithms, which are applied wherever message integrity and authentication are critical. Implementations of these functions are cryptographic primitives widely used in common cryptographic schemes and security protocols such as Internet Protocol Security (IPSec) and Virtual Private Network (VPN). In this paper, a novel FPGA implementation of the Secure Hash Algorithm 1 (SHA-1) is proposed. The proposed architecture exploits the benefits of pipeline and re-timing of execution through pre-computation of intermediate temporal values. Pipeline allows division of the calculation of the hash value in four discreet stages, corresponding to the four required rounds of the algorithm. Re-timing is based on the decomposition of the SHA-1 expression to separate information dependencies and independencies. This allows pre-computation of intermediate temporal values in parallel to the calculation of other independent values. Exploiting the information dependencies, the fundamental operational block of SHA-1 is modified so that maximum operation frequency is increased by 30% approximately with negligible area penalty compared to other academic and commercial implementations. The proposed SHA-1 hash function was prototyped and verified using a XILINX FPGA device. The implementation’s characteristics are compared to alternative implementations proposed by the academia and the industry, which are available in the international IP market. The proposed implementation achieved a throughput that exceeded 2,5 Gbps, which is the highest among all similar IP cores for the targeted XILINX technology.

international conference on electronics circuits and systems | 2003

A novel high-speed counter with counting rate independent of the counter's length

Athanasios P. Kakarountas; George Theodoridis; Kyriakos Papadomanolakis; Costas E. Goutis

Counters are among the basic blocks in every digital system. We propose a novel high-speed counter with a constant counting rate, independent of its length. Exploiting special features of the binary arithmetic system and adopting prescaling techniques, a segmented counter architecture is introduced. Particularly, to realize a counter of any length properly, two designed modules of four-bit counter are used in a systolic manner. The counting rate is bounded by the delay of two basic gates of three inputs plus the delay of a T F/F. In AMS 0.6 /spl mu/m technology a maximum of 430 MHz counting frequency is achieved.

Journal of Circuits, Systems, and Computers | 2005

A RECONFIGURABLE COARSE-GRAIN DATA-PATH FOR ACCELERATING COMPUTATIONAL INTENSIVE KERNELS

Michalis D. Galanis; George Theodoridis; Spyros Tragoudas; Constantinos E. Goutis

In this paper, a high-performance reconfigurable coarse-grain data-path, part of a hybrid reconfigurable platform, is introduced. The data-path consists of coarse-grain components that their flexibility and universality is shown to increase the systems performance due to significant reductions in latency. A methodology of unsophisticated but efficient algorithms for mapping computational intensive applications on the proposed data-path is also presented. Results on Digital Signal Processing and multimedia benchmarks show an average execution cycles reduction of 20%, combined with an area consumption decrease, when the proposed data-path is compared with a high-performance one. The average cycles reduction is even greater, 44%, when the comparison is held with a data-path that instantiates primitive computational resources on FPGA hardware.

international conference on embedded computer systems: architectures, modeling, and simulation | 2007

The ARISE Reconfigurable Instruction Set Extensions Framework

Nikolaos Vassiliadis; George Theodoridis; Spiridon Nikolaidis

In this paper, we introduce the ARISE framework for the systematic extension of typical processors with the necessary infrastructure to support arbitrary number and type of reconfigurable hardware units. ARISE extends the micro-architecture of the processor with an interface to allow the coupling of the hardware units. Furthermore, the instruction set of the processor is extended with instructions which expose to the programmer/compiler the full control of the interface. This control includes the configuration of operations on the hardware units, execution of these operations, and communication of data between the processor and the units. The new instructions are incorporated without the need to redesign the processor instruction set architecture. To evaluate our proposal a model of an ARISE extended MIPS processor has been designed. Using a turbodecoder algorithm as benchmarking application a simulation of the ARISE model has been performed. Performance results show impressive application speedups up to times7.5.

international parallel and distributed processing symposium | 2006

An automated development framework for a RISC processor with reconfigurable instruction set extensions

Nikolaos Vassiliadis; George Theodoridis; Spiridon Nikolaidis

By coupling a reconfigurable hardware to a standard processor, high levels of flexibility and adaptability are achieved. However, this approach requires modifications to the compiler of the processor to take into account reconfigurable aspects. In this paper, a development framework for a RISC processor with reconfigurable instruction set extensions is presented. The framework is fully automated, hiding all reconfigurable related issues from the user and can be used for both program and fine-tune the architecture at design time. We demonstrate the above issues using a set of benchmarks. Experimental results show an x2.9 average speedup in addition to potential energy reduction

Explore More