Costas E. Goutis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Costas E. Goutis is active.

Explore More

Publication

Featured researches published by Costas E. Goutis.

IEEE Transactions on Circuits and Systems for Video Technology | 2001

Evaluation of design alternatives for the 2-D-discrete wavelet transform

Nikolaos D. Zervas; Giorgos P. Anagnostopoulos; Vassilis Spiliotopoulos; Yiannis Andreopoulos; Costas E. Goutis

In this paper, the three main hardware architectures for the 2-D discrete wavelet transform (2-D-DWT) are reviewed. Also, optimization techniques applicable to all three architectures are described. The main contribution of this work is the quantitative comparison among these design alternatives for the 2-D-DWT. The comparison is performed in terms of memory requirements, throughput, and energy dissipation, and is based on a theoretical analysis of the alternative architectures and schedules. Memory requirements, throughput, and energy are expressed by analytical equations with parameters from both the 2-D-DWT algorithm and the implementation platform. The parameterized equations enable the early but efficient exploration of the various tradeoffs related to the selection to the one or the other architecture.

Journal of Real-time Image Processing | 2008

Efficient high-performance implementation of JPEG-LS encoder

Markos E. Papadonikolakis; Athanasios P. Kakarountas; Costas E. Goutis

A new design approach to create an efficient high-performance JPEG-LS encoder is proposed in this paper. The proposed implementation compresses the image data with the lossless mode of JPEG-LS. When the acquisition of precious content (image) is specified to occur in real-time, then lossless compression is essential. Lossless compression is important to critical applications, such as the acquisition of medical images and transmission of high-definition high-resolution images from space (satellite). The contribution of the paper is to introduce an efficient pipelined JPEG-LS encoder, which requires significantly lower encoding time than any other available JPEG-LS hardware or software implementation. The experimental results show that encoding is performed as expected in high-speed, being able to serve real-time applications. This is the first time that a JPEG-LS implementation offers such a high-speed encoding.

IEEE Transactions on Very Large Scale Integration Systems | 1999

Strategy for power-efficient design of parallel systems

Koen Danckaert; Kostas Masselos; F. Cathoor; H.J. De Man; Costas E. Goutis

Application studies in the areas of image- and video-processing indicate that between 50%-80% of the power cost in these systems is due to data storage and transfers. This is especially true for multiprocessor realizations because conventional parallelization methods ignore the power cost and focus only on performance. However, the power consumption also heavily depends on the way a system is parallelized. To reduce this dominant cost, we propose to address the system-level storage organization for the multidimensional signals as a first step in mapping these applications, before the parallelization or partitioning decisions (in particular, before the hardware/software (HW/SW) partitioning, which is traditionally done too early in the design trajectory). Our methodology is illustrated on a parallel quadtree-structured difference pulse-code modulation video codec.

international conference on electronics circuits and systems | 2004

Comparison of the hardware architectures and FPGA implementations of stream ciphers

Michalis D. Galanis; Paris Kitsos; Giorgos Kostopoulos; Nicolas Sklavos; Odysseas G. Koufopavlou; Costas E. Goutis

In this paper, the hardware implementations of five representative stream ciphers are compared in terms of performance and consumed area. The ciphers used for the comparison are the A5/1, W7, E0, RC4 and Helix. The first three ones have been used for the security part of well-known standards. The Helix cipher is a recently introduced fast, word oriented, stream cipher. The W7 algorithm has been recently proposed as a more trustworthy solution for GSM, due to the security problems that occurred concerning the A5/1 strength. The designs were coded using the VHDL language. For the hardware implementation of the designs, an FPGA device was used. The implementation results illustrate the hardware performance of each cipher in terms of throughput-to-area ratio. This ratio equals: 5.88 for the A5/1, 1.26 for the W7, 0.21 for the E0, 2.45 for the Helix and 0.86 for the RC4.

international conference on electronics circuits and systems | 2004

Efficient implementation of the keyed-hash message authentication code (HMAC) using the SHA-1 hash function

Harris E. Michail; Athanasios P. Kakarountas; Athanasios Milidonis; Costas E. Goutis

In this paper, an efficient implementation, in terms of performance, of the keyed-hash message authentication code (HMAC) using the SHA-1 hash function is presented. This mechanism is used for message authentication in combination with a shared secret key. The proposed hardware implementation can be synthesized easily for a variety of FPGA and ASIC technologies. Simulation results, using commercial tools, verified the efficiency of the HMAC implementation in terms of performance and throughput. Special care has been taken so that the proposed implementation does not introduce extra design complexity; while in-parallel functionality was kept to the required levels.

Microprocessors and Microsystems | 2009

Resource aware mapping on coarse grained reconfigurable arrays

Stavros Georgiopoulos; Michalis D. Galanis; Costas E. Goutis

Coarse grain reconfigurable array architectures have become increasingly popular due to their flexibility, scalability and performance. However, the mapping of programs on these architectures is characterized by huge complexity. This work presents a new mapping methodology for effectively mapping applications on coarse grained reconfigurable arrays. The core of this methodology comprises of the scheduling and register allocation phases performed, for the first time in the case of CGRAs, in a single step. Additionally, modulo scheduling with backtracking capability is incorporated in this scheme. The main contribution of this work includes a novel technique for minimizing the memory bandwidth bottleneck, a new priority scheme and a new set of heuristics which target on the maximization of the Instruction Level Parallelism by efficiently managing the architectures resources. The overall approach is retargetable with respect to a parametric architecture template modelling a large number of architecture alternatives and it has been automated with a prototype tool which permits experimental exploration. The experimental results showed that the achieved performance figures are very close to the most effective ones derived from the theoretical study on the architectures resources and the applications requirements. Moreover, the application of the bandwidth optimization technique lead to a 20-130% increase on operation parallelism. Finally, the experiments quantified the benefit from applying the new priority scheme and heuristics.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1997

A VLSI design methodology for RNS full adder-based inner product architectures

Dimitrios Soudris; Vassilis Paliouras; Thanos Stouraitis; Costas E. Goutis

In this paper, a systematic graph-based methodology for synthesizing VLSI RNS architectures using full adders as the basic building block is introduced. The design methodology derives array architectures starting from the algorithm level and ending up with the bit-level design. Using as target architectural style the regular array processor, the proposed procedure constructs the two-dimensional (2-D) dependence graph of the bit-level algorithm, which is formally described by sets of uniform recurrent equations. The main characteristic of the proposed architectures is that they can operate at very high-throughput rates. The proposed architectures exhibit significantly reduced complexity over ROM-based ones.

international conference on image processing | 2001

A local wavelet transform implementation versus an optimal row-column algorithm for the 2D multilevel decomposition

Yiannis Andreopoulos; Nikolaos D. Zervas; Gauthier Lafruit; Peter Schelkens; Thanos Stouraitis; Costas E. Goutis; Jan Cornelis

A new method for the implementation of the binary-tree decomposition of the convolution-based wavelet transform, called the local wavelet transform (LWT) has been recently proposed in the literature. While it produces exactly the same results as the classical row-column implementation of the transform, it has many implementation benefits. This fact is shown experimentally for the first time for a general-purpose processor-based architecture, by comparing our C implementation of the LWT with an optimal C implementation of the lifting-scheme row-column algorithm. The comparisons are made for the forward multilevel binary-tree decomposition using the 9/7 filter pair, in the typical Intel Pentium processor family.

international conference on microelectronics | 1994

FPGA implementation of artificial neural networks: an application on medical expert systems

G.-P.K. Economou; Evaggelinos P. Mariatos; N.M. Economopoulos; Dimitris K. Lymberopoulos; Costas E. Goutis

In this paper, the FPGA implementation of an Artificial Neural Networks (ANNs) composition for a Medical Expert System (MES) focused on pulmonary diseases is discussed. Using a specially designed neuron based on pipelined bit-serial arithmetic and a successful approximation of its determinant sigmoid function, a computation module has been structured that can accommodate eight (8) neurons in one FPGA. The use of memory elements allows for up to 256 K synapses to be mapped with high speed and great accuracy performances. Also, due to the FPGA reconfigurability, new structures and training patterns can be used to update this MES, in order to fit in more pulmonary or other diseases, with minimal effort.

IEEE Transactions on Dependable and Secure Computing | 2009

A Top-Down Design Methodology for Ultrahigh-Performance Hashing Cores

Harris E. Michail; Athanasios P. Kakarountas; Athanasios Milidonis; Costas E. Goutis

Many cryptographic primitives that are used in cryptographic schemes and security protocols such as SET, PKI, IPSec, and VPNs utilize hash functions, which form a special family of cryptographic algorithms. Applications that use these security schemes are becoming very popular as time goes by and this means that some of these applications call for higher throughput either due to their rapid acceptance by the market or due to their nature. In this work, a new methodology is presented for achieving high operating frequency and throughput for the implementations of all widely used-and those expected to be used in the near future-hash functions such as MD-5, SHA-1, RIPEMD (all versions), SHA-256, SHA-384, SHA-512, and so forth. In the proposed methodology, five different techniques have been developed and combined with the finest way so as to achieve the maximum performance. Compared to conventional pipelined implementations of hash functions (in FPGAs), the proposed methodology can lead even to a 160 percent throughput increase.

Explore More