Marián Vajteršic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marián Vajteršic is active.

Explore More

Publication

Featured researches published by Marián Vajteršic.

parallel computing | 2002

Dynamic ordering for a parallel block-Jacobi SVD algorithm

Martin Bečka; Gabriel Okša; Marián Vajteršic

A new approach for the parallel computation of singular value decomposition (SVD) of matrix A ∈ Cm×n is proposed. Contrary to the known algorithms that use a static cyclic ordering of subproblems simultaneously solved in one iteration step, the proposed implementation of the two-sided block-Jacobi method uses a dynamic ordering of subproblems. The dynamic ordering takes into account the actual status of matrix A. In each iteration step, a set of the off-diagonal blocks is determined that reduces the Frobenius norm of the off-diagonal elements of A as much as possible and, at the same time, can be annihilated concurrently. The solution of this task is equivalent to the solution of the maximum-weight perfect matching problem. The greedy algorithm for the efficient solution of this problem is presented. The computational experiments with both types of ordering, incorporated into the two-sided block-Jacobi method, were performed on an SGI - Cray Origin 2000 parallel computer using the Message Passing Interface (MPI). The results confirm that the dynamic ordering is much more efficient with regard to the amount of work required for the computation of SVD of a given accuracy than the static cyclic ordering.

Parallel Algorithms and Applications | 1999

BLOCK-JACOBI SVD ALGORITHMS FOR DISTRIBUTED MEMORY SYSTEMS I: HYPERCUBES AND RINGS*

Martin Bečka; Marián Vajteršic

The paper presents parallel algorithms for efficient solution of the Singular Value Decomposition (SVD) problem by the block two-sided Jacobi method. In this part of the work, we show how the method may be used on MIMD computers with hypercube and ring topologies. We analyse three types of orderings for solving SVD on block-structured submatrices from the point of view of communication requirements and suitability for parallel execution of the computational process The algorithms map well onto the hypercube topology. Two of the ordering schemes can also be directly implemented on rings. Results obtained on an Intel Paragon are shown and discussed for all the three types of orderings.

Parallel Algorithms and Applications | 1999

BLOCK-JACOBI SVD ALGORITHMS FOR DISTRIBUTED MEMORY SYSTEMS II: MESHES∗

Martin Bečka; Marián Vajteršic

This paper deals with a parallelization of the two-sided Jacobi algorithm for computation of Singular Value Decomposition (SVD) on a computer with p processors, which are organized into a two-dimensional √p × √p mesh configuration. This work represents a continuation of our paper (Part I, to appear in J. Parallel Algorithms and Applications), which described a parallelization approach by columns and efficient ordering strategies for the hypercube and ring topologies. Our parallelization approach is based on slicing the matrices by rows and columns. The orderings developed for rings and hypercubes are adopted here and we show a proper assignment of submatrices to processors that enables their efficient parallel execution. A complexity comparison to the column-based algorithm is given. Parallel computational experiments on a Paragon system are presented and discussed for two test matrices.

Computing | 1979

A fast algorithm for solving the first biharmonic boundary value problem

Marián Vajteršic

This paper presents a fast iterative algorithm for the solution of a finite difference approximation of the biharmonic boundary value problem on a rectangular region. For solving this problem, the matrix decomposition algorithm is efficiently applied to the semi-direct method which essentially treats the biharmonic equation as a coupled system of Poisson equations. Assuming anN×N grid of mesh points, the number of operations required for one iteration and for the solution terminated by 0 (N−2) is 0 (N2) and 0 (N5/2 log2N), respectively. ForN2 processors, the parallel version of this algorithm would require 14 log2N steps per iteration. Both results are better than those known. A numerical experiment in a serial computation is also given.ZusammenfassungIn diesem Artikel wird ein schneller Algorithmus für die Lösung von Differenzen-Approximation des ersten Biharmonischen Randwertproblems auf einem rechtwinkligen Gebiet dargestellt. Für die Lösung dieses Problems wird der Algorithmus der Matrix-Dekomposition in der halbdirekten Methode, die die Biharmonische Gleichung als ein gekoppltes Paar von Poisson-Gleichungen behandelt, effektiv verwendet. Unter der Annahme einesNxN Netzes ist die Anzahl der erforderlichen arithmetischen Operationen für eine Iteration bzw. für eine Lösung, die mit Genauigkeit 0 (N−2) bestimmt wird, 0 (N2) bzw. 0 (N5/2 log2N). Die parallele Version dieses Algorithmus fordert mitN2 Prozessoren 14 log2N Schritten für eine Iteration. Beide Resultate sind besser als bis jetzt bekannte Resultate. Für die sequentielle Berechnung wird ein numerisches Experiment angegeben.

parallel computing | 1992

The application of VLSI Poisson solvers to the biharmonic problem

G. Lotti; Marián Vajteršic

Abstract The VLSI implementation of the semidirect algorithm for solving the biharmonic problem on an n × n grid is proposed. For solving a coupled pair of Poisson equations in one iteration of this algorithm VLSI Poisson solvers are used. VLSI Poisson solvers based on the matrix-decomposition algorithm and on a modified cyclic reduction solver are considered for the VLSI biharmonic design. The analysis of corresponding AT2 complexity estimations (A - area; T - time) shows that the best result AT2 = O(n4 log5 n) until now has been achieved for these algorithms.

The Journal of Supercomputing | 2013

Concurrent programming constructs for parallel MPI applications

Tobias Berka; Giorgios Kollias; Helge Hagenauer; Marián Vajteršic

Concurrency and parallelism have long been viewed as important, but somewhat distinct concepts. While concurrency is extensively used to amortize latency (for example, in web- and database-servers, user interfaces, etc.), parallelism is traditionally used to enhance performance through execution on multiple functional units. Motivated by an evolving application mix and trends in hardware architecture, there has been a push toward integrating traditional programming models for concurrency and parallelism. Use of conventional threads APIs (POSIX, OpenMP) with messaging libraries (MPI), however, leads to significant programmability concerns, owing primarily to their disparate programming models. In this paper, we describe a novel API and associated runtime for concurrent programming, called MPI Threads (MPIT), which provides a portable and reliable abstraction of low-level threading facilities. We describe various design decisions in MPIT, their underlying motivation, and associated semantics. We provide performance measurements for our prototype implementation to quantify overheads associated with various operations. Finally, we discuss two real-world use cases: an asynchronous message queue and a parallel information retrieval system. We demonstrate that MPIT provides a versatile, low overhead programming model that can be leveraged to program large parallel ensembles.

Journal of Parallel and Distributed Computing | 2013

Parallel rare term vector replacement: Fast and effective dimensionality reduction for text

Tobias Berka; Marián Vajteršic

Dimensionality reduction is an established area in text mining and information retrieval. These methods convert the highly sparse corpus matrices into dense matrix format while preserving or improving the classification accuracy or retrieval performance. In this paper, we describe a novel approach to dimensionality reduction for text, along with a parallel algorithm suitable for private memory parallel computer systems. According to Zipfs law, the majority of indexing terms occurs only in a small number of documents. Our algorithm replaces rare terms by computing a vector which expresses their semantics in terms of common terms. This process produces a projection matrix, which can be applied to a corpus matrix and individual document and query vectors. We give an accurate mathematical and algorithmic description of our algorithms and present an experimental evaluation on two benchmark corpora. These experiments indicate that our algorithm can deliver a substantial reduction in the number of features, from 47,236 to 392 features on the Reuters corpus with a clear improvement in the retrieval performance. We have evaluated our parallel implementation using the message passing interface with up to 32 processes on a Nehalem Xeon cluster, computing the projection matrix for the dimensionality reduction for over 800,000 documents in just under 100 s.

parallel computing | 1984

Short communication: Parallel marching Poisson solvers

Marián Vajteršic

The paper presents parallel algorithms for solving Poisson equation at N^2 mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N^2 processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only.The paper presents parallel algorithms for solving Poisson equation a t N 2 mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N 2 processors, the complexity bound O(Iog N) has been achieved, whereby the single marching requires 10 log N steps only.

euromicro workshop on parallel and distributed processing | 2001

Multi-level parallelism in the block-Jacobi SVD algorithm

Gabriel Okša; Marián Vajteršic

We analyse the fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n/, m/spl ges/n. The algorithm involves the class CO of parallel orderings on the two-dimensional toroidal mesh with p processors. The mathematical background is based on the QR decomposition (QRD) of local data matrices and on the triangular Kogbetliantz algorithm (TKA) for local SVDs in the diagonal mesh processors. Subsequent updates of local matrices in the diagonal as well as nondiagonal mesh processors are required. WE show that all updates can be realized by orthogonal modified Givens rotations. These rotations can be efficiently pipelined in parallel in the horizontal and vertical rings of /spl radic/p processors through the toroidal mesh. For one mesh processor our solution requires O[(m+n)/sup 2///sub p/] systolic processing elements (PEs). O(m/sup 2//p) local memory registers and O[(m+n)/sup 2//p] additional delay elements. The time complexity of our solution is O[(m+n/sup 3/2//p/sup 3/4/)/spl Delta/] time steps per one global iteration where /spl Delta/ is the length of the global synchronization time step that is given by evaluation and application of two modified Givens rotations in TKA.

Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis | 1997

Block-SVD algorithms and their adaptation to hypercubes and rings

Marián Vajteršic; Martin Bečka

The paper presents parallel algorithms for efficient solution of the SVD (singular value decomposition) problem by the block two sided Jacobi method. It is shown how the method could be applied to MIMD computers with the hypercube and ring topology. Three types of orderings for solving SVD on block structured submatrices are analysed from the point of view of communication requirements and suitability for a parallel execution of the computational process, which is carried out on block columns of the matrix. All three orderings fit well to the hypercube topology. Two of them can be directly implemented also on rings. The optimality in parallelization of the method and data transfers has been achieved there within each sweep. For the third scheme, an efficient numbering of processor nodes is discussed. Computer results obtained on an Intel Paragon system are shown for a chosen ordering.

Explore More