Kiamal Z. Pekmestzi
National Technical University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kiamal Z. Pekmestzi.
IEEE Transactions on Computers | 1999
Kiamal Z. Pekmestzi
A new algorithm for the multiplication of two n-bit numbers based on the synchronous computation of the partial sums of the two operands is presented. The proposed algorithm permits an efficient realization of the parallel multiplication using iterative arrays. At the same time, it permits high-speed operation. Multiplier arrays for positive numbers and numbers in twos complement form based on the proposed technique are implemented. Also, an efficient pipeline form of the proposed multiplication scheme is introduced. All multipliers obtained have low circuit complexity permitting high-speed operation and the interconnections of the cells are regular, well-suited for VLSI realization.
international conference on information security | 2001
Kostas Marinis; Nikos K. Moshopoulos; Fotis Karoubalis; Kiamal Z. Pekmestzi
This paper discusses the design and implementation of the Confidentiality and Integrity algorithms, which have been standardized by the 3- rd Generation Partnership Project. Both algorithms use a modified version of the MISTY scheme, named KASUMI, as a basic cryptographic engine. Various architectural approaches have been examined and implemented in different hardware platforms (FPGAs, ASICs) providing useful information regarding the required area and the design throughput.
IEEE Transactions on Circuits and Systems I-regular Papers | 2005
Paul Bougas; Paraskevas Kalivas; Andreas Tsirikos; Kiamal Z. Pekmestzi
The elaborate design of folded finite-impulse response (FIR) filters based on pipelined multiplier arrays is presented in this paper. The design is considered at the bit-level and the internal delays of the pipelined multiplier array are fully exploited in order to reduce hardware complexity. Both direct and transposed FIR filter forms are considered. The carry-save and the carry-propagate multiplier arrays are studied for the filter implementations. Partially folded architectures are also proposed which are implemented by cascading a number of folded FIR filters. The proposed schemes are compared as to the aspect of hardware complexity with a straightforward implementation of a folded FIR filter based on the pipelined Wallace Tree multiplier. The comparison reveals that the proposed schemes require 20%-30% less hardware. Finally, efficient implementation of partially folded FIR filter circuits is presented when constraints in area, power consumption and clock frequency are given.
IEEE Transactions on Circuits and Systems I-regular Papers | 2014
Kostas Tsoumanis; Sotirios Xydis; Constantinos Efstathiou; Nikolaos Moschopoulos; Kiamal Z. Pekmestzi
Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance. We investigate techniques to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the proposed technique yields considerable reductions in terms of critical delay, hardware complexity and power consumption of the FAM unit.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1996
Christos Gr. Caraiscos; Kiamal Z. Pekmestzi
A new scheme for a high-throughput and low-latency systolic implementation of FIR digital filters is proposed. The input and output sequences are in bit-parallel LSB-first bit-skewed form, and the throughput is limited by the propagation delay of a gated full adder and a latch. The bits of a full-bit output sample start coming out of the array three clock cycles after the bits of the corresponding input sample enter the array.
international conference on embedded computer systems: architectures, modeling, and simulation | 2010
Sotirios Xydis; Alexandros Bartzas; Iraklis Anagnostopoulos; Dimitrios Soudris; Kiamal Z. Pekmestzi
We address the problem of custom Dynamic Memory Management (DMM) in Multi-Processor System-on-Chip (MPSoC) architectures. Customization is enabled through the definition of a design space that captures in a global, modular and parameterized manner the primitive building blocks of multi-threaded DMM. A systematic exploration methodology is proposed to efficiently traverse the design space. Customized Pareto DMM configurations are automatically generated through the development of software tools implementing the proposed methodology. Experimental evaluation based on a real-life multithreaded dynamic network application show that the proposed methodology delivers higher quality (application-specific) solutions in comparison with state-of-the-art dynamic memory managers together with 62% exploration runtime reductions.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1994
Kiamal Z. Pekmestzi; N. Thanasouras
The operation of systolic counters, based on the pipelining of the conventional binary counters, is examined. The application of these circuits in the implementation of frequency dividers/counters is also presented. The proposed systolic counters have small circuit complexity and permit very high speed operation. >
Integration | 2009
Sotirios Xydis; George Economakos; Kiamal Z. Pekmestzi
This paper introduces a design technique for coarse-grained reconfigurable architectures targeting digital signal processing (DSP) applications. The design procedure is analyzed in detail and an area-time-power efficient reconfigurable kernel architecture is presented. The proposed technique inlines flexibility into custom carry-save (CS) arithmetic datapaths exploiting a stable and canonical interconnection scheme. The canonical interconnection is revealed by a transformation, called uniformity transformation, imposed on the basic architectures of CS-multipliers and CS-chain-adders/subtractors. Experimental results including quantitative and qualitative comparisons with existing reconfigurable arithmetic cores and exploration results of the proposed reconfigurable architecture are provided.
IEEE Transactions on Very Large Scale Integration Systems | 2011
Sotirios Xydis; George Economakos; Dimitrios Soudris; Kiamal Z. Pekmestzi
This paper presents a new methodology for the synthesis of high performance flexible datapaths, targeting computationally intensive digital signal processing kernels of embedded applications. The proposed methodology is based on a novel coarse-grained reconfigurable/flexible architectural template, which enables the combined exploitation of the horizontal and vertical parallelism along with the operation chaining opportunities found in the applications behavioral description. Efficient synthesis techniques exploiting these architectural optimization concepts from a higher level of abstraction are presented and analyzed. Extensive experimentation showed average latency and area reductions up to 33.9% and 53.9%, respectively, and higher hardware area utilization, compared to previously published high performance coarse-grained reconfigurable datapaths.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 2001
Kiamal Z. Pekmestzi; Paraskevas Kalivas; Nikos K. Moshopoulos
A systolic serial multiplier and a squarer for unsigned numbers - which operate without zero words inserted between successive data words, output the full product, and have only one clock cycle latency-are presented. The multiplier is based on a modified serial/parallel scheme that operates with 100% efficiency. The systolic form is obtained by merging two adjacent multiplier cells. The same technique is used for the design of a serial squarer. The systolisity and the continuous operation are achieved without an increase in hardware complexity. The proposed schemes are well suited for long number multiplication and squaring.