Dušan B. Gajić | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dušan B. Gajić is active.

Explore More

Publication

Featured researches published by Dušan B. Gajić.

computer aided systems theory | 2015

Remarks on Characterization of Bent Functions in Terms of Gibbs Dyadic Derivatives

Radomir S. Stankovic; Jaakko Astola; Claudio Moraga; Milena Stankovic; Dušan B. Gajić

The term dyadic derivative was coined by F. Pichler [9] for a differential operator introduced by J.E. Gibbs in 1967 [3] which was initially called the logic derivative since being acting on the set of binary n-tuples. Both names, the logic derivative and the dyadic derivative, are related with the property that this set equipped with the addition modulo 2 (EXOR) expresses the structure of a group \(C_{2}^{n}\) called the finite dyadic group, which is viewed as a natural domain to define binary-valued switching functions.

international symposium on multiple valued logic | 2014

Constant Geometry Algorithms for Galois Field Expressions and Their Implementation on GPUs

Radomir S. Stankovic; Jaakko Astola; Claudio Moraga; Dušan B. Gajić

Galois field (GF) expressions are analytical representations of multiple-valued functions. For practical applications it is important to provide fast algorithms for computing coefficients in these expressions. From the FFT-theory point of view, these algorithms are Cooley-Tukey type algorithms based on the Good-Thomas factorization derived from the Kronecker product structure of the GF-transform matrices. These algorithms are good for reducing the number of operations in Central Processing Unit (CPU) implementations. When implemented over Graphics Processing Units (GPUs), the address arithmetic becomes an important factor determining the efficiency of the implementations, due to the differences between the CPU and GPU based architectures and the corresponding programming philosophies. In this paper, we define the constant geometry algorithms for computing the coefficients in GF-expressions by an analogy with the corresponding algorithms in Fourier analysis on finite Abelian groups. We performed an experimental verification of the proposed algorithms compared to the Cooley-Tukey algorithms over two GPU platforms (Nvidia and AMD) and two programming environments (CUDA and OpenCL) with the corresponding CPU implementations. The speedup achieved by constant geometry algorithms increases with the number of variables and, therefore, the constant geometry algorithms are more advantageous in the case of functions with a larger number of variables.

international symposium on multiple-valued logic | 2013

The Impact of Address Arithmetic on the GPU Implementation of Fast Algorithms for the Vilenkin-Chrestenson Transform

Dušan B. Gajić; Radomir S. Stankovic

This paper considers the impact of address arithmetic in the Cooley-Tukey and the constant geometry fast algorithms for the Vilenkin-Chrestenson transform on their implementation for the graphics processing unit (GPU). We consider issues such as using different transform radices and analyze the number of GPU instructions and register usage in the OpenCL implementations of the considered algorithms. Further, we compare the program running times on the GPU and on the central processing unit (CPU). Experiments show that the GPU implementations are from 10 to 22 times faster than the C/C++ CPU implementations, depending on the transform radix and the number of variables in the processed function. The OpenCL implementation of the constant geometry algorithm translates into a lower number of GPU arithmetic and fetch instructions and uses less registers. This implementation requires up to 21% shorter processing times than the corresponding Cooley-Tukey algorithm implementation.

international symposium on multiple valued logic | 2017

Fast Computation of the Discrete Pascal Transform

Dušan B. Gajić; Radomir S. Stankovic

The discrete Pascal transform (DPT) is a relatively recently introduced spectral transform based on the concept of the Pascal triangle which has been known for centuries. It is used in digital image processing, digital filtering, pattern recognition, watermarking, and related areas. Its applicabilityis limited by the O(N^2) asymptotical time complexity of bestcurrent algorithms for its computation, where N is the size of the function to be processed. In this paper, we propose a method for the efficient computation of the DPT in O(N logN) time, based on the factorization of its transform matrix into a product of three matrices with special structure - two diagonal matrices and a Toeplitz matrix. The Toeplitz matrix is further embedded into a circulant matrix of order 2N. The diagonalization of the circulant matrix by the Fourier matrix permits the use of the fast Fourier transform (FFT) for performing the computations, leading to an algorithm with the overall computational complexity of O(N logN). Since the entries in the Toeplitz matrix have very different magnitudes, the numerical stability of this algorithm is also discussed. We also consider the issues in implementing the proposed algorithm for highly-parallel computation on graphicsprocessing units (GPUs). The experiments show that computing the DPT using the proposed algorithm processed on GPUs is orders of magnitude faster than the best current approach. As a result, the proposed method can significantly extend the practical applicability of the discrete Pascal transform.

intelligent distributed computing | 2017

Binary Classification of Images for Applications in Intelligent 3D Scanning

Branislav Vezilić; Dušan B. Gajić; Dinu Dragan; Veljko B. Petrović; Srđan Mihić; Zoran Anišić; Vladimir Puhalac

Three-dimensional (3D) scanning techniques based on photogrammetry, also known as Structure-from-Motion (SfM), require many two-dimensional (2D) images of an object, obtained from different viewpoints, in order to create its 3D reconstruction. When these images are acquired using closed-space 3D scanning rigs, which are composed of large number of cameras fitted on multiple pods, flash photography is required and image acquisition must be well synchronized to avoid the problem of ‘misfired’ cameras. This paper presents an approach to binary classification (as ‘good’ or ‘misfired’) of images obtained during the 3D scanning process, using four machine learning methods—support vector machines, artificial neural networks, k-nearest neighbors algorithm, and random forests. Input to the algorithms are histograms of regions determined to be of interest in the detection of image misfires. The considered algorithms are evaluated based on the prediction accuracy that they achieved on our dataset. The average prediction accuracy of 94.19% is obtained using the random forests approach under cross-validation. Therefore, the application of the proposed approach allows the development of an ‘intelligent’ 3D scanning system which can automatically detect camera misfiring and repeat the scanning process without the need for human intervention.

International Journal of Reasoning-based Intelligent Systems | 2017

A performance analysis of computing the LU and the QR matrix decompositions on the CPU and the GPU

Dušan B. Gajić; Radomir S. Stankovic; Miloš Radmanović

We present an analysis of time efficiency of five different implementations of the LU and the QR decomposition of matrices performed on central processing unit (CPUs) and graphics processing units (GPUs). Three of the considered implementations, developed using the Eigen C++ library, Intel MKL, and MATLAB are executed on a multi-core CPU. The remaining two implementations are processed on a GPU and employ MATLABs Parallel Computing Toolbox and Nvidia CUDA augmented with the cuSolver library. Computation times are compared using randomly generated single- and double-precision floating-point matrices. The experiments for the LU decomposition show that the two GPU implementations offer best performance for matrices that can fit into the GPU global memory. For larger LU decomposition problem instances, Intel MKL on the CPU is found to be the fastest approach. Furthermore, Intel MKL also proves to be the fastest method for computing QR decomposition for all considered sizes of matrices.

Archive | 2015

Efficient Computation of Gibbs Derivatives on Finite Abelian Groups

Radomir S. Stankovic; Dušan B. Gajić

From signal processing point of view, the derivative for functions of real variables, which is a notion originated by Newton and converted into a strong mathematical concept by Leibniz, can be viewed as an operator intended to estimate the rate of change and the direction of change of a signal for the infinitesimally small change of its argument. The same idea of having such an operator in the context of Walsh analysis can be mentioned as a motivation for the introduction of the Gibbs derivative. Being an operator for functions defined on dyadic group, the shift of the argument is defined in terms of the componentwise addition modulo 2 of binary representations of the argument and the increment. For the application of a mathematical operator in engineering practice it is often required to provide efficient computation methods. In this chapter, we discuss fast algorithms to compute Gibbs derivatives on finite Abelian groups and their implementation on graphics processing units (GPUs).

International Journal of Reasoning-based Intelligent Systems | 2012

Implementation of dyadic correlation and autocorrelation on graphics processors

Dušan B. Gajić; Radomir S. Stankovic; Miloš Radmanović

The convolution and related operators of correlation and autocorrelation are essential and powerful mathematical tools in machine learning, signal processing, systems theory, and related areas. In particular, representation and handling of systems with binary encoded input and output signals requires intensive computation of the correlation and autocorrelation functions which are defined on the finite dyadic groups as the underlying algebraic structure. This paper presents methods for computing the dyadic correlation and autocorrelation functions on graphics processing units (GPUs). The proposed algorithms are based on the convolution and the Wiener–Khinchin theorems and implemented using the Open Computing Language (OpenCL). We address several key issues in developing an efficient mapping of the computations to the GPU architecture. The experimental results confirm that the application of the proposed method leads to significant computational speedups over traditional C/C++ implementations processed on central processing units (CPUs).

soft computing | 2016