Is this you? Create Your Porfile

Konstantinos Manolopoulos

National and Kapodistrian University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Konstantinos Manolopoulos is active.

Explore More

Publication

Featured researches published by Konstantinos Manolopoulos.

international conference on electronics, circuits, and systems | 2011

An efficient multiple precision floating-point multiplier

Konstantinos Manolopoulos; Dionisios I. Reisis; Vassilios A. Chouliaras

The current paper presents a multi-mode floating point multiplier operating efficiently with every precision format specified by the IEEE 754–2008 standard. The design performs one quadruple precision multiplication, or two double precision multiplications in parallel, or four single precision multiplications in parallel. The proposed multiplier is pipelined to achieve execution of one quadruple multiplication in 3 cycles and either two double precision operations in parallel or four single precision operations in parallel in only 2 cycles. The proposed design improves the throughput by a factor of two compared to a double precision multiplier and by four compared to a single precision multiplication. An example implementation on VLSI verifies the design and it achieves a maximum operating frequency of 505 MHz.

international conference on electronics, circuits, and systems | 2006

A High Performance VLSI FFT Architecture

Konstantinos Babionitakis; Konstantinos Manolopoulos; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos; Vassilios A. Chouliaras

High performance VLSI-based FFT architectures are key to signal processing and telecommunication systems since they meet the hard real-time constraints at low silicon area and low power compared to CPU-based solutions. In order to meet these goals, this paper presents a novel VLSI FFT architecture based on combining three consecutive radix-4 stages to result in a 64-point FFT engine. Cascading these 64-point FFT engines consequences an improved architecture design featuring certain characteristics. First, it can efficiently accommodate large input data sets in real time. It also simplifies processing requirements due to the radix-4 calculations. Finally, it reduces memory requirements and latency to one third compared to the fully unfolded radix-4 architecture. Two different implementations are utilized in order to validate the architecture efficiency: a FPGA implementation of a 4096-point FFT achieving a throughput of 4096 point/20.48 usec, and a VLSI implementation sustaining a throughput of 4096 point/3.89 usec.

international conference on electronics, circuits, and systems | 2010

An efficient dual-mode floating-point Multiply-Add Fused Unit

Konstantinos Manolopoulos; Dionysios I. Reisis; Vassilios A. Chouliaras

Multiply-Add Fused (MAF) units play a key role in the processors performance for a variety of applications. Aiming at improving the MAF functionality this paper presents a dual-mode MAF architecture, which is able to perform either one double-precision or two single-precision operations in parallel. The design attains low latency by following a dual-path approach and by combining final addition with rounding. The organization performs a MAF instruction in three cycles, while single floating-point addition in two cycles. The design has been validated and implemented with TSMC 0.13um.

Integration | 2008

Customization of an embedded RISC CPU with SIMD extensions for video encoding: A case study

Vassilios A. Chouliaras; Vincent M. Dwyer; Shahrukh Agha; Jose Luis Nunez-Yanez; Dionysios I. Reisis; Konstantinos Nakos; Konstantinos Manolopoulos

This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both scalar and vector form. Results demonstrate a reduction of up to 68% in the dynamic instruction count of the full search-based encoder whereas the fast motion estimation algorithms achieved a reduction in instruction count of nearly 90%, both accelerated via three 128-bit vector/SIMD instructions when compared to the scalar, reference implementation of the standard. We address in detail the profiling, vectorization and the development of these vector instruction set extensions, discuss in depth the implementation of a parametric vector accelerator that implements these instructions and show the introduction of that accelerator into a 32-bit RISC processor pipeline, in a closely-coupled configuration.

international conference on electronics, circuits, and systems | 2007

High Performance 16K, 64K, 256K complex points VLSI Systolic FFT Architectures

Konstantinos Manolopoulos; Konstantinos Nakos; Dionysios I. Reisis; Nikolaos Vlassopoulos; Vassilios A. Chouliaras

Targeting to improving the efficiency of real-time Fourier transform computations with large input data sets, this paper presents the design and the VLSI implementation of 16 K, 64 K and 265 K complex points fast Fourier transform (FFT) systolic architectures. These organizations are deeply pipelined to maximize the operating frequency and follow the approach of decomposing the transforms into 64 -point FFT computations to minimize the buffer size between consecutive stages. The resulting organizations achieve real time performance on testing and observation applications. They include simple processing elements and they are scalable with respect to the operating frequency and data width. Validation on FPGA showed operation at 250 MHz and 125 MHz for the 16 K and the 64 K architectures with throughput lGs/s and 500 Ms/s respectively. The VLSI implementations of the proposed 16 K, 64 K and 265 K architectures achieve post-route clock frequencies of 352, 256.5, and 188 MHz respectively and they can sustain throughputs of 1.4 Gs/s, lGs/s and 188 Ms/s.

Microelectronics Journal | 2016

An efficient multiple precision floating-point Multiply-Add Fused unit

Konstantinos Manolopoulos; Dionisios I. Reisis; Vassilios A. Chouliaras

Multiply-Add Fused (MAF) units play a key role in the processors performance for a variety of applications. The objective of this paper is to present a multi-functional, multiple precision floating-point Multiply-Add Fused (MAF) unit. The proposed MAF is reconfigurable and able to execute a quadruple precision MAF instruction, or two double precision instructions, or four single precision instructions in parallel. The MAF architecture features a dual-path organization reducing the latency of the floating-point add (FADD) instruction and utilizes the minimum number of operating components to keep the area low. The proposed MAF design was implemented on a 65nm silicon process achieving a maximum operating frequency of 293.5MHz at 381mW power.

international conference on electronics, circuits, and systems | 2012

Signal processing for deep-sea observatories with reconfigurable hardware

Konstantinos Manolopoulos; Anastasios Belias; Georgios Georgis; Dionysios I. Reisis; E. G. Anasontzis

The recent evolution of deep-sea observatories has provided the infrastructure for studying rare phenomena in astroparticle physics, extended phenomena in physical oceanography and environmental monitoring for climate modeling and civic alert systems. The observatories involve sets of sensors distributed in the deep-sea, which transmit data through Gbit electro-optical lines to a shore station for real-time processing. Each set of sensors communicates data and control with the shore station through a readout system. Targeting the improvement of the observatory, the current paper proposes a readout system with enhanced functionality, which includes the ability to reconfigure the communication channels, provide statistic measurements of the communicated data and efficient data filtering. The design of the architecture is suited for FPGA implementation and the instantiation on the Xilinx ML605 board validates the results.

symposium on cloud computing | 2009

A configurable length, Fused Multiply-Add floating point unit for a VLIW processor

Vassilios A. Chouliaras; Konstantinos Manolopoulos; Dionysios I. Reisis

The efficiency of Fused Multiply Add units plays a key role in the processors performance for a variety of applications. A design keeping the advantages of the FMA regarding the latency and the hardware utilization and also improving the results accuracy in both normalized and denormalized numbers is the subject of this work. The FMA unit has configurable latency and it is integrated in a VLIW processor. The VLSI TSMC 0.13 implementation achieved an operating frequency of 232.6 MHz and a final post-routed area of 121900.478 um2.

International Journal of Computers and Applications | 2007

Thread-parallel MPEG-2 and MPEG-4 encoders for shared-memory System-on-Chip multiprocessors

Vassilios A. Chouliaras; Tr Jacobs; Jose Luis Nunez-Yanez; Konstantinos Manolopoulos; Konstantinos Nakos; Dionysios I. Reisis

Abstract This work focuses on speeding up MPEG-2 and MPEG-4 encoding by using thread parallelism for shared-memory, System-on-Chip (SoC) multiprocessors. Improving the performance of the MPEG encoders is shown by reducing the dynamic instruction count at multiple processor contexts and then mapping onto a configurable SoC multiprocessor. The resulting reduction in the dynamic instruction count of the parallelized MPEG-2 TM5 encoder for 32 processor contexts reaches a maximum of 95% and that of the MPEG-4 XViD a maximum of 83% for 16 processor contexts, both compared to the sequential encoder. To realize the parallelized encoders we present a configurable, N-way, extensible, bus-based, cache-coherent SoC multiprocessor, augmented with data-parallel coprocessors, and we give the VLSI implementation for the 2-way and 4-way configurations.

signal processing systems | 2010