Is this you? Create Your Porfile

M. P. Bekakos

Democritus University of Thrace

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where M. P. Bekakos is active.

Explore More

Publication

Featured researches published by M. P. Bekakos.

The Journal of Supercomputing | 2004

Parallel and systolic solution of normalized explicit approximate inverse preconditioning

George A. Gravvanis; Konstantinos M. Giannoutakis; M. P. Bekakos; O. B. Efremides

A new class of normalized approximate inverse matrix techniques, based on the concept of sparse normalized approximate factorization procedures are introduced for solving sparse linear systems derived from the finite difference discretization of partial differential equations. Normalized explicit preconditioned conjugate gradient type methods in conjunction with normalized approximate inverse matrix techniques are presented for the efficient solution of sparse linear systems. Theoretical results on the rate of convergence of the normalized explicit preconditioned conjugate gradient scheme and estimates of the required computational work are presented. Application of the new proposed methods on two dimensional initial/boundary value problems is discussed and numerical results are given. The parallel and systolic implementation of the dominant computational part is also investigated.

The Journal of Supercomputing | 2009

Synthesis of space optimal systolic arrays for band matrix-vector multiplication

Emina I. Milovanovic; M. P. Bekakos; Igor Z. Milovanovic

In this paper, we consider the implementation of a product c=Ab, where A is N1×N3 band matrix with bandwidth ω and b is a vector of size N3×1, on bidirectional and unidirectional linear systolic arrays (BLSA and ULSA, respectively). We distinguish the cases when the matrix bandwidth ω is 1≤ω≤N3 and N3≤ω≤N1+N3−1. A modification of the systolic array synthesis procedure based on data dependencies and space-time transformations of data dependency graph is proposed. The modification enables obtaining both BLSA and ULSA with an optimal number of processing elements (PEs) regardless of the matrix bandwidth. The execution time of the synthesized arrays has been minimized. We derive explicit formulas for the synthesis of these arrays. The performances of the designed arrays are discussed and compared to the performances of the arrays obtained by the standard design procedure.

international conference on signal processing | 2007

FPGA Implementation of a Unidirectional Systolic Array Generator for Matrix-Vector Multiplication

M.C. Karra; M. P. Bekakos; Igor Z. Milovanovic; Emina I. Milovanovic

Systolic arrays may prove ideal structures for the representation and the mapping of many applications concerning various numerical and non-numerical scientific applications. Especially, some formulation of Dynamic Programming (DP) - a commonly used technique for solving a wide variety of discrete optimization problems, such as scheduling, string-editing, packaging, and inventory management can be solved in parallel on systolic arrays as matrix-vector products. Systolic arrays usually have a very high rate of I/O and are well suited for intensive parallel operations Herein is a description of the FPGA hardware implementation of a matrix-vector multiplication algorithm designed to produce a unidirectional systolic array representation.

The Journal of Supercomputing | 2014

Performance scalability and energy consumption on distributed and many-core platforms

E. M. Karanikolaou; Emina I. Milovanovic; Igor Z. Milovanovic; M. P. Bekakos

In this paper, the performance evaluation of distributed and many-core computer complexes, in conjunction with their consumed energy, is investigated. The distributed execution of a specific problem on an interconnected processors platform requires a larger amount of energy compared to the sequential execution. The primary reason is the inability to fully parallelize a problem due to the unavoidable serial parts and the intercommunication of utilized processors. Distributed and many-core platforms are evaluated for the power their processors demand at the idle and fully utilized state. In proportion to the parallelized percentage each time, the estimations of the theoretical model were compared to the experimental results achieved on the basis of the performance/power and performance/energy ratio metrics. Analytical formulas for evaluating the experimental energy consumption have been developed for both platforms, while the experimental vehicle was a widely known algorithm with different parallelization percentages.

International Journal of Computer Mathematics | 2010

Forty-three ways of systolic matrix multiplication

Igor Z. Milovanovic; M. P. Bekakos; I. N. Tselepis; Emina I. Milovanovic

This paper investigates different ways of systolic matrix multiplication. We prove that in total there are 43 arrays for multiplication of rectangular matrices. We also prove that, depending on the mutual relation between the dimensions of rectangular matrices, there is either 1 or 21 arrays with minimal number of processing elements. Explicit mathematical formulae for systolic array synthesis are derived. The methodology applied to obtain 43 systolic designs is based on the modification of the synthesis procedure based on dependency vectors and space-time mapping of the dependency graph.

The Journal of Supercomputing | 2007

Computing all-pairs shortest paths on a linear systolic array and hardware realization on a reprogrammable FPGA platform

Emina I. Milovanovic; Igor Z. Milovanovic; M. P. Bekakos; I. N. Tselepis

Abstract In this paper a regular bidirectional linear systolic array (RBLSA) for computing all-pairs shortest paths of a given directed graph is designed. The obtained array is optimal with respect to a number of processing elements (PE) for a given problem size. The execution time of the array has been minimized. To obtain RBLSA with optimal number of PEs, the accommodation of the inner computation space of the systolic algorithm to the projection direction vector is performed. Finally, FPGA-based reprogrammable systems are revolutionizing certain types of computation and digital logic, since as logic emulation systems they offer some orders of magnitude speedup over software simulation; herein, a FPGA realization of the RBLSA is investigated and the performance evaluation results are discussed.

The Journal of Supercomputing | 2006

A FPGA-Based Systolic Array Prototype Implementing the Quadrant Interlocking Factorization Method

M.Ch. Karra; M. P. Bekakos

The systolic processing offers the possibility of solving a large number of standard problems on multicellular computing devices with autonomous cells (Processing Elements—PEs). The resulting systolic arrays exploit the underlying parallelism of many computationally intensive problems and offer a vital and effective way of handling them. Advances in technology and especially in VLSI and FPGA have an ongoing contribution to the evolution of systolic arrays. Herein, a FPGA-based Systolic array prototype implementing the Factorization stage of the Quadrant Interlocking Factorization—QIF (Butterfly) method is presented and the corresponding time-complexities achieved are discussed.

international conference on information and communication technologies | 2006

A FPGA-based Dewavefront Array Prototype Implementing the Quadrant Interlocking Factorization Method

M.Ch. Karra; M. P. Bekakos

The systolic processing offers the possibility of solving a large number of standard problems on multicellular computing devices with autonomous cells (processing elements - PEs). The resulting systolic arrays exploit the underlying parallelism of many computationally intensive problems and offer a vital and effective way of handling them. Advances in technology and especially in VLSI and FPGA have an ongoing contribution to the evolution of systolic and wavefront arrays. The concept of wavefront arrays differentiates from that of the systolic arrays in the following: the wavefront arrays consist of PEs with varying computation time allowed and the overall control is achieved by handshaking signals amongst the cells (instead of a common clock). Herein, a FPGA-based de(double ended) wavefront array prototype implementing the factorization stage of the quadrant interlocking factorization - QIF (Butterfly) method is presented and the corresponding time-complexities achieved are discussed

International Journal of Computer Mathematics | 2002

Implementation of the generalized WZ factorization on a wavefront array processor

O. B. Efremides; M. P. Bekakos; David J. Evans

In this work, the potential of a block generalized WZ factorization procedure for direct hardware implementation on a special-purpose network of Wavefront Array Processor Modules (WAPm s ) is investigated. The definition of a wavefront module is given and the development of an Eigenvalue Evaluator Engine ( E 3 ) based on this module is discussed.

International Journal of Computer Mathematics | 2004

Note on the feedback control algorithms used in high-speed networks

Theodore A. Tsiligiridis; M. P. Bekakos; David J. Evans

In this article we analyze a linear feedback control algorithm particularly suited to the Available Bit Rate service class in Asynchronous Transfer Mode (ATM) networks. We envisage the development of a closed-loop, fluid approximation model, in which the propagation delay is reflected across the network, while the rate of transmission and the queue occupancy are modeled as fluids. Using a fluid model has the advantage to permit a simplified study of the network behavior. The above model is described with the continuous-time system of delay-differential equations, which is solved semi-analytically. The contribution of this work is to provide a sending rate scheme, which is based on both a rate control function and a suitable fuzzy function for network load and delay. It is shown that the concept of fuzzy set theory can be proved beneficial in the analysis of network load and delay, whose uncertainty is an inherent characteristic. Finally, the developments in the area of time-delay systems control allow to compute exact stability bounds of the Round Trip Time (RTT) and thus to indicate if the connection is in a stable state. †E-mail: [email protected] ‡E-mail: [email protected]

Explore More