Emina I. Milovanovic
University of Niš
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emina I. Milovanovic.
Computers & Mathematics With Applications | 1997
Ivan Milentijevic; I.Z̆. Milovanović; Emina I. Milovanovic; M.K. Stojc̆ev
Abstract The objective of this paper is to provide a systematic methodology for the design of space-time optimal pure planar systolic arrays for matrix multiplication. The procedure is based on data dependence approach. By the described procedure, we obtain ten different systolic arrays denoted as S 1 to S 10 classified into three classes according to interconnection patterns between the processing elements. Common properties of all systolic array designs are: each systolic array consists of n 2 processing elements, near-neighbour communications, and active execution time of 3 n − 2 time units. Compared to designs found in the literature, our procedure always leads to systolic arrays with optimal number of processing elements. The improvement in space domain is not achieved at the cost of execution time or PEs complexity. We present mathematically rigorous procedure which gives the exact ordering of input matrix elements at the beginning of the computation. Examples illustrating the methodology are shown.
Mathematical and Computer Modelling | 2006
Igor Z. Milovanovic; Emina I. Milovanovic; M. P. Bekakos
In this paper we present a procedure, based on data dependencies and space-time transformations of index space, to design a unidirectional linear systolic array (ULSA) for computing a matrix-vector product. The obtained array is optimal with respect to the number of processing elements (PEs) for a given problem size. The execution time of the array is the minimal possible for that number of PEs. To achieve this, we first derive an appropriate systolic algorithm for ULSA synthesis. In order to design a ULSA with the optimal number of PEs we then perform an accommodation of the index space to the projection direction vector. The performance of the synthesized array is discussed and compared with the bidirectional linear SA. Finally, we demonstrate how this array can be used to compute the correlation of two given sequences.
Applied Mathematics and Computation | 2016
Igor Z. Milovanovic; Emina I. Milovanovic; Ivan Gutman
A general inequality for non-negative real numbers is proven. Based on it, upper bounds for (ordinary) graph energy, minimum dominating energy, minimum covering energy, Laplacian-energy-like invariant, Laplacian energy, Randic energy, and incidence energy are obtained.
Computers & Mathematics With Applications | 2000
Emina I. Milovanovic; M.K. Stojc̆ev; N.M Novaković; I.Z̆. Milovanović; Teufik Tokic
Abstract This paper considers the multiplication of matrix A = (aik)n × n by vector b = (b k ) n × 1 on the bidirectional linear systolic array (BLSA) comprised of p ≤ [ n 2 ] processing elements. To accomplish this matrix, A is partitioned into quasi-diagonal blocks. Each block contains p quasidiagonals. To avoid zero element insertion between successive iterations during the computation of the resulting vector ovrarr|c, we perform index transformation in the block matrices and vector c . The index transformation can be described as perfect shuffle followed by the shifting. Besides, we propose an efficient hardware interface, called memory interface subsystem (MIS), located between the host and BLSA, which optimize memory access by elimination of extraneous main-memory operations. Then we evaluate the speedup and efficiency of the proposed matrix-vector multiplication algorithm. To estimate benefits obtained by introducing MIS, we compare host occupation with data transfer during matrix-vector multiplication on the BLSA without MIS and when it is involved.
Applied Mathematics and Computation | 2016
Emina I. Milovanovic; Igor Z. Milovanovic; Edin Dolicanin; E. Glogic
Let G = ( V , E ) , V = { 1 , 2 , ? , n } , E = { e 1 , e 2 , ? , e m } , be a simple graph with n vertices and m edges. Denote by d(ei) ( i = 1 , 2 , ? , m ) an edge degree, and by E M 1 = ? i = 1 m d ( e i ) 2 first reformulated Zagreb index of graph G. Upper and lower bounds of graph invariant EM1 are obtained.
Microelectronics Reliability | 2010
Mile K. Stojcev; Igor Z. Milovanovic; Emina I. Milovanovic; Tatjana R. Nikolic
Systolic arrays (SAs) are very efficient architectures for multimedia processing, database management, and scientific computing applications that are characterized by a high number of data access. However, in these data transfer and storage intensive applications, memory access is often the limiting factor to the computation speed. Since the memory subsystem dominates the cost (area), performance and power consumption of the SA, we have to pay a special attention to how memory subsystem can benefit from customization. In this paper we consider memory organization of linear systolic array with bi-directional links (called BLSA) suitable for implementation of broad class of algorithms. We assume that memory is organized into distributed smaller physical memory modules. In order to provide high bandwidth in data access we have designed special hardware, called address generator unit (AGU). The function of AGU is threefold. First, during the initialization, it transforms host address space into BLSA address space. Second, provides efficient memory data access during BLSA operation. Third, performs fast data transfer between BLSA and host at the end of the computation. In this article, we examine the impact on area and performance of memory access related circuity in eliminating computational intensive offset address calculations performed in software by implementing the needed address transformations with the AGUs. By involving hardware AGUs we achieved a speedup of approximately two, compared to the software implementation of address calculation, with a hardware overhead of only 7.6% in the worst case.
The Journal of Supercomputing | 2009
Emina I. Milovanovic; M. P. Bekakos; Igor Z. Milovanovic
In this paper, we consider the implementation of a product c=Ab, where A is N1×N3 band matrix with bandwidth ω and b is a vector of size N3×1, on bidirectional and unidirectional linear systolic arrays (BLSA and ULSA, respectively). We distinguish the cases when the matrix bandwidth ω is 1≤ω≤N3 and N3≤ω≤N1+N3−1. A modification of the systolic array synthesis procedure based on data dependencies and space-time transformations of data dependency graph is proposed. The modification enables obtaining both BLSA and ULSA with an optimal number of processing elements (PEs) regardless of the matrix bandwidth. The execution time of the synthesized arrays has been minimized. We derive explicit formulas for the synthesis of these arrays. The performances of the designed arrays are discussed and compared to the performances of the arrays obtained by the standard design procedure.
international conference on signal processing | 2007
M.C. Karra; M. P. Bekakos; Igor Z. Milovanovic; Emina I. Milovanovic
Systolic arrays may prove ideal structures for the representation and the mapping of many applications concerning various numerical and non-numerical scientific applications. Especially, some formulation of Dynamic Programming (DP) - a commonly used technique for solving a wide variety of discrete optimization problems, such as scheduling, string-editing, packaging, and inventory management can be solved in parallel on systolic arrays as matrix-vector products. Systolic arrays usually have a very high rate of I/O and are well suited for intensive parallel operations Herein is a description of the FPGA hardware implementation of a matrix-vector multiplication algorithm designed to produce a unidirectional systolic array representation.
international conference on numerical analysis and its applications | 2004
Igor Z. Milovanovic; Emina I. Milovanovic; B. M. Randjelovic
We consider the problem of computing transitive closure of a given directed graph on the regular bidirectional systolic array. The designed array has n PEs, where n is a number of nodes in the graph. This is an optimal number for a given problem size.
The Journal of Supercomputing | 2014
E. M. Karanikolaou; Emina I. Milovanovic; Igor Z. Milovanovic; M. P. Bekakos
In this paper, the performance evaluation of distributed and many-core computer complexes, in conjunction with their consumed energy, is investigated. The distributed execution of a specific problem on an interconnected processors platform requires a larger amount of energy compared to the sequential execution. The primary reason is the inability to fully parallelize a problem due to the unavoidable serial parts and the intercommunication of utilized processors. Distributed and many-core platforms are evaluated for the power their processors demand at the idle and fully utilized state. In proportion to the parallelized percentage each time, the estimations of the theoretical model were compared to the experimental results achieved on the basis of the performance/power and performance/energy ratio metrics. Analytical formulas for evaluating the experimental energy consumption have been developed for both platforms, while the experimental vehicle was a widely known algorithm with different parallelization percentages.