Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Constantinos E. Goutis is active.

Publication


Featured researches published by Constantinos E. Goutis.


power and timing modeling optimization and simulation | 2000

Data-Reuse and Parallel Embedded Architectures for Low-Power, Real-Time Multimedia Applications

Dimitrios Soudris; Nikolaos D. Zervas; Antonios Argyriou; Minas Dasygenis; Konstantinos Tatas; Constantinos E. Goutis; Adonios Thanailakis

Exploitation of data re-use in combination with the use of custom memory hierarchy that exploits the temporal locality of data accesses may introduce significant power savings, especially for data-intensive applications. The effect of the data-reuse decisions on the power dissipation but also on area and performance of multimedia applications realized on multiple embedded cores is explored. The interaction between the data-reuse decisions and the selection of a certain data-memory architecture model is also studied. As demonstrator a widely-used video processing algorithmic kernel, namely the full search motion estimation kernel, is used. Experimental results prove that improvements in both power and performance can be acquired, when the right combination of data memory architecture model and data-reuse transformation is selected.


international symposium on circuits and systems | 2005

A low-power and high-throughput implementation of the SHA-1 hash function

Haralambos Michail; Athanasios P. Kakarountas; Odysseas G. Koufopavlou; Constantinos E. Goutis

The main applications of the hash functions are met in the fields of communication integrity and signature authentication. A hash function is utilized in the security layer of every communication protocol. However, as protocols evolve and new high-performance applications appear, the throughput of most hash functions seems to reach a limit. Furthermore, due to the tendency of the market to minimize device size and increase autonomy to make them portable, power issues have also to be considered. The existing SHA-1 hash function implementations (SHA-1 is common in many protocols e.g. IPSec) limit throughput to a maximum of 2 Gbit/s. In this paper, a new implementation comes to exceed this limit improving the throughput by 53%. Furthermore, power dissipation is kept low compared to previous works.


IEEE Transactions on Signal Processing | 1995

Prime-factor DCT algorithms

Anna Tatsaki; Chrissavgi Dre; Thanos Stouraitis; Constantinos E. Goutis

In this correspondence, new algorithms are presented for computing the l-D and 2-D discrete cosine transform (DCT) of even length by using the discrete Fourier transform (DFT). A comparison of the proposed algorithms to other fast ones points out their computational efficiency, which is mainly based on the advantages of prime-factor decomposition and a proper choice of index mappings. >


design, automation, and test in europe | 2002

A Code Transformation-Based Methodology for Improving I-Cache Performance of DSP Applications

Nikolaos D. Liveris; Nikolaos D. Zervas; Dimitrios Soudris; Constantinos E. Goutis

This paper focuses on I-cache behaviour enhancement through the application of high-level code transformations. Specifically, a flow for the iterative application of the I-Cache performance optimizing transformations is proposed. The procedure of applying transformation is driven by a set of analytical equations, which receive parameters related to code and I-cache structure and predict the number of I-cache misses. Experimental results from a real-life demonstration application shows that order of magnitude reductions of the number of I-cache misses can be achieved by the application of the proposed methodology.


international symposium on circuits and systems | 1994

A fast DCT processor, based on special purpose CORDIC rotators

Evaggelinos P. Mariatos; Dimitris Metafas; John Ant. Hallas; Constantinos E. Goutis

In this paper, a new processor that computes the Discrete Cosine Transform (DCT) is presented. Based on a recently proposed DCT algorithm, this architecture overcomes the major drawbacks of the original implementation resulting in a design with considerably less area consumption and higher speed. To achieve these results, a novel architecture, based on the CORDIC circular rotation algorithm, is introduced: it reduces the required area by more than 60% compared to the use of standard CORDIC architectures. Furthermore, bit serial arithmetic is used, resulting in a very compact design. In order to get maximum throughput, the proposed processor is fully pipelined, achieving a performance efficient even for signals as fast as HDTV.<<ETX>>


Journal of Circuits, Systems, and Computers | 2005

A RECONFIGURABLE COARSE-GRAIN DATA-PATH FOR ACCELERATING COMPUTATIONAL INTENSIVE KERNELS

Michalis D. Galanis; George Theodoridis; Spyros Tragoudas; Constantinos E. Goutis

In this paper, a high-performance reconfigurable coarse-grain data-path, part of a hybrid reconfigurable platform, is introduced. The data-path consists of coarse-grain components that their flexibility and universality is shown to increase the systems performance due to significant reductions in latency. A methodology of unsophisticated but efficient algorithms for mapping computational intensive applications on the proposed data-path is also presented. Results on Digital Signal Processing and multimedia benchmarks show an average execution cycles reduction of 20%, combined with an area consumption decrease, when the proposed data-path is compared with a high-performance one. The average cycles reduction is even greater, 44%, when the comparison is held with a data-path that instantiates primitive computational resources on FPGA hardware.


Integration | 2005

A high-throughput, memory efficient architecture for computing the tile-based 2D discrete wavelet transform for the JPEG2000

Grigoris Dimitroulakos; Michalis D. Galanis; Athanasios Milidonis; Constantinos E. Goutis

In this paper, the design and implementation of an optimized hardware architecture in terms of speed and memory requirements for computing the tile-based 2D forward discrete wavelet transform for the JPEG2000 image compression standard, are described. The proposed architecture is based on a well-known architecture template for calculating the 2D forward discrete wavelet transform. This architecture is derived by replacing the filtering units by our previously published throughput-optimized ones and by developing a scheduling algorithm suited to the special features of our filtering units. The architecture exhibits high-performance characteristics due to the throughput-optimized filters. Also, the extra clock cycles required due to the tile-based version of the discrete wavelet transform are partially compensated by the proper scheduling of the filters. The developed scheduling algorithm results in reduced memory requirements compared with existing architectures.


international conference on electronics, circuits, and systems | 2010

Exploration of cryptographic ASIP designs for wireless sensor nodes

Ioanna Tsekoura; Georgios N. Selimis; Jos Hulzink; Francky Catthoor; Jos Huisken; Harmke de Groot; Constantinos E. Goutis

We present the design of 4 Application Specific Instruction Set Processors (8-bit, 32-bit, 64-bit and 128-bit ASIP) which provide typical 16-bit general instructions and accelerate a common cryptographic domain. The ASIPs support the following security services: data confidentiality, data authentication, data integrity and replay attack protection and their design is appropriate for wireless sensor networks. The corresponding software for each ASIP has been optimized in terms of clock cycles and memory accesses. We evaluate the 4 ASIPs in terms of performance, power consumption, energy dissipation and area occupation. When our most energy efficient design (128-bit ASIP) operates on AES-CCM-32 security mode at a clock frequency of 100 MHz, it dissipates 41.86 nJ achieving a maximum throughput of 21.76 Mbps, while at a lower clock frequency of 4.61 MHz, it achieves a throughput of 1 Mbps, a typical value in the WSN, and dissipates energy of 35.20 nJ. The corresponding area overhead, for 90nm technology, excluding the memories, is 34.3K NAND2 equivalents. Comparisons with other works are given.


international symposium on circuits and systems | 2000

A methodology for the behavioral-level event-driven power management of digital receivers

Nikolaos D. Zervas; Dimitrios Soudris; S. Theoharis; Constantinos E. Goutis; Adonios Thanailakis

Power management is a low-power technique applicable in almost all design levels. Event-driven power management has been applied at the system-level. The same concept can be applied for receiver design at the behavioral-level. Power management involves a trade-off according to which, on the one hand, power is decreased by shutting down parts of the circuit, but on the other hand, power is increased by the insertion of the required logic for the generation of the shutdown signals. In this paper, receiver context characteristics are exploited in order to develop a methodology for the behavioral-level exploration of this trade-off. The efficiency of the proposed methodology is proven by its application on a real-life digital DECT receiver.


IEEE Transactions on Very Large Scale Integration Systems | 1999

Novel techniques for bus power consumption reduction in realizations of sum-of-product computation

Kostas Masselos; Panagiotis Merakos; Thanos Stouraitis; Constantinos E. Goutis

Novel techniques for power-efficient implementation of sum of product computation are presented. The proposed techniques aim at reducing the switching activity required for the successive evaluation of the partial products, in the busses connecting the storage elements where data and coefficients are stored to the functional units. This is achieved through reordering the sequence of evaluation of the partial products. Heuristics based on the traveling salesman problem are proposed to perform the reordering for different categories of algorithms. Information related to both data (dynamic) and coefficients (static) is used to drive the reordering. Experimental results from the application of the proposed techniques on several signal-processing algorithms have proven that significant switching activity savings can be achieved.

Collaboration


Dive into the Constantinos E. Goutis's collaboration.

Top Co-Authors

Avatar

Dimitrios Soudris

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kostas Masselos

University of Peloponnese

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Spiridon Nikolaidis

Aristotle University of Thessaloniki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge