Konstantinos Nakos
National and Kapodistrian University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Konstantinos Nakos.
international conference on electronics, circuits, and systems | 2006
Konstantinos Babionitakis; Konstantinos Manolopoulos; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos; Vassilios A. Chouliaras
High performance VLSI-based FFT architectures are key to signal processing and telecommunication systems since they meet the hard real-time constraints at low silicon area and low power compared to CPU-based solutions. In order to meet these goals, this paper presents a novel VLSI FFT architecture based on combining three consecutive radix-4 stages to result in a 64-point FFT engine. Cascading these 64-point FFT engines consequences an improved architecture design featuring certain characteristics. First, it can efficiently accommodate large input data sets in real time. It also simplifies processing requirements due to the radix-4 calculations. Finally, it reduces memory requirements and latency to one third compared to the fully unfolded radix-4 architecture. Two different implementations are utilized in order to validate the architecture efficiency: a FPGA implementation of a 4096-point FFT achieving a throughput of 4096 point/20.48 usec, and a VLSI implementation sustaining a throughput of 4096 point/3.89 usec.
international conference on electronics, circuits, and systems | 2006
Konstantinos Babionitakis; George Lentaris; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos; Gregory Doumenis; George Georgakarakos; John Sifnaios
Video technology evolution has boosted the need for the H.264/AVC encoder with real-time performance. In order to meet such need the present paper presents a VLSI H.264/AVC encoder architecture and the relevant details on design and implementation of the specific modules. The encoder design complies with the reference software encoder of the standard and follows the baseline profile level 3.0. The encoder constitutes an IP-core and/or stand-alone solution targeting to low area applications. The architecture achieves maximum throughput of 30 frames/sec with frame size 1024times768. Results and performance measurements of the entire encoder have been validated on FPGA and VLSI .18 mum.
international conference on electronics, circuits, and systems | 2008
Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos
This paper presents an efficient technique for addressing in radix-2 FFT architectures. The novel addressing organization provides parallel load and store of the data involved in a radix-2 butterfly computation. The addressing scheme is based on a permutation of the FFT data, which leads to the minimization of the address generating circuit and the butterfly processor control. The paper proves the correctness of the technique and includes a FPGA implementation.
Integration | 2008
Vassilios A. Chouliaras; Vincent M. Dwyer; Shahrukh Agha; Jose Luis Nunez-Yanez; Dionysios I. Reisis; Konstantinos Nakos; Konstantinos Manolopoulos
This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both scalar and vector form. Results demonstrate a reduction of up to 68% in the dynamic instruction count of the full search-based encoder whereas the fast motion estimation algorithms achieved a reduction in instruction count of nearly 90%, both accelerated via three 128-bit vector/SIMD instructions when compared to the scalar, reference implementation of the standard. We address in detail the profiling, vectorization and the development of these vector instruction set extensions, discuss in depth the implementation of a parametric vector accelerator that implements these instructions and show the introduction of that accelerator into a 32-bit RISC processor pipeline, in a closely-coupled configuration.
international conference on electronics, circuits, and systems | 2007
Konstantinos Manolopoulos; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos; Vassilios A. Chouliaras
Targeting to improving the efficiency of real-time Fourier transform computations with large input data sets, this paper presents the design and the VLSI implementation of 16 K, 64 K and 265 K complex points fast Fourier transform (FFT) systolic architectures. These organizations are deeply pipelined to maximize the operating frequency and follow the approach of decomposing the transforms into 64 -point FFT computations to minimize the buffer size between consecutive stages. The resulting organizations achieve real time performance on testing and observation applications. They include simple processing elements and they are scalable with respect to the operating frequency and data width. Validation on FPGA showed operation at 250 MHz and 125 MHz for the 16 K and the 64 K architectures with throughput lGs/s and 500 Ms/s respectively. The VLSI implementations of the proposed 16 K, 64 K and 265 K architectures achieve post-route clock frequencies of 352, 256.5, and 188 MHz respectively and they can sustain throughputs of 1.4 Gs/s, lGs/s and 188 Ms/s.
international conference on electronics, circuits, and systems | 2009
Vassilios A. Chouliaras; Panagiotis Galiatsatos; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos
This paper presents a throughput efficient cascaded FFT architecture suitable for OFDM telecommunication applications. The design exploits a technique parallelizing the radix-2 butterfly computations to increase the throughput by 2, while it keeps the complexity of the VLSI area equal to the single path delay feedback architectures. A 2048 complex point radix-2 implementation with .13 TSMC validates the results.
international conference on electronics circuits and systems | 2003
Konstantinos Babionitakis; Y. Dagres; Konstantinos Nakos; Dionysios I. Reisis
This paper presents a VLSI architecture for optimizing the transmission power required in turbo-Coded Orthogonal Frequency Division Multiplexing modems. The technique adapts the transmission parameters according to the Quality of Service requirements. CORDIC computations are used to improve the VSLI area. The architecture performs at wire-speed, uses minimal area and has shown the performance gain in an indoor wireless application. An implementation using Field Programmable Gated Array technology has validated the results.
Journal of Signal Processing Systems | 2018
V. Kitsakis; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos
The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. Moreover, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straightforward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7 VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results.
International Journal of Computers and Applications | 2007
Vassilios A. Chouliaras; Tr Jacobs; Jose Luis Nunez-Yanez; Konstantinos Manolopoulos; Konstantinos Nakos; Dionysios I. Reisis
Abstract This work focuses on speeding up MPEG-2 and MPEG-4 encoding by using thread parallelism for shared-memory, System-on-Chip (SoC) multiprocessors. Improving the performance of the MPEG encoders is shown by reducing the dynamic instruction count at multiple processor contexts and then mapping onto a configurable SoC multiprocessor. The resulting reduction in the dynamic instruction count of the parallelized MPEG-2 TM5 encoder for 32 processor contexts reaches a maximum of 95% and that of the MPEG-4 XViD a maximum of 83% for 16 processor contexts, both compared to the sequential encoder. To realize the parallelized encoders we present a configurable, N-way, extensible, bus-based, cache-coherent SoC multiprocessor, augmented with data-parallel coprocessors, and we give the VLSI implementation for the 2-way and 4-way configurations.
Journal of Real-time Image Processing | 2008
Konstantinos Babionitakis; Gregory Doumenis; George Georgakarakos; George Lentaris; Konstantinos Nakos; Dionysios I. Reisis; Ioannis Sifnaios; Nikolaos Vlassopoulos