Is this you? Create Your Porfile

Georgios I. Goumas

National Technical University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georgios I. Goumas is active.

Explore More

Publication

Featured researches published by Georgios I. Goumas.

Nucleic Acids Research | 2009

DIANA-microT web server: elucidating microRNA functions through target prediction

Manolis Maragkakis; Martin Reczko; Victor A. Simossis; Panagiotis Alexiou; Giorgos L. Papadopoulos; Theodore Dalamagas; Giorgos Giannopoulos; Georgios I. Goumas; Evangelos Koukis; Kornilios Kourtis; Thanasis Vergoulis; Nectarios Koziris; Timos K. Sellis; Panayotis Tsanakas; Artemis G. Hatzigeorgiou

Computational microRNA (miRNA) target prediction is one of the key means for deciphering the role of miRNAs in development and disease. Here, we present the DIANA-microT web server as the user interface to the DIANA-microT 3.0 miRNA target prediction algorithm. The web server provides extensive information for predicted miRNA:target gene interactions with a user-friendly interface, providing extensive connectivity to online biological resources. Target gene and miRNA functions may be elucidated through automated bibliographic searches and functional information is accessible through Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The web server offers links to nomenclature, sequence and protein databases, and users are facilitated by being able to search for targeted genes using different nomenclatures or functional features, such as the genes possible involvement in biological pathways. The target prediction algorithm supports parameters calculated individually for each miRNA:target gene interaction and provides a signal-to-noise ratio and a precision score that helps in the evaluation of the significance of the predicted results. Using a set of miRNA targets recently identified through the pSILAC method, the performance of several computational target prediction programs was assessed. DIANA-microT 3.0 achieved there with 66% the highest ratio of correctly predicted targets over all predicted targets. The DIANA-microT web server is freely available at www.microrna.gr/microT.

The Journal of Supercomputing | 2009

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

Georgios I. Goumas; Kornilios Kourtis; Nikos Anastopoulos; Vasileios Karakasis; Nectarios Koziris

In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture’s specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.

computing frontiers | 2008

Optimizing sparse matrix-vector multiplication using index and value compression

Kornilios Kourtis; Georgios I. Goumas; Nectarios Koziris

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrices that contain a small number of distinct values

parallel, distributed and network-based processing | 2008

Understanding the Performance of Sparse Matrix-Vector Multiplication

Georgios I. Goumas; Kornilios Kourtis; Nikos Anastopoulos; Vasileios Karakasis; Nectarios Koziris

In this paper we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided and thus unsuccessful attempts for optimization. In order to gain an insight on the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. Based on our experiments we extract useful conclusions that can serve as guidelines for the subsequent optimization process of the kernel.

acm sigplan symposium on principles and practice of parallel programming | 2011

CSX: an extended compression format for spmv on shared memory systems

Kornilios Kourtis; Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve the kernels performance is to drastically reduce the data volume involved in the computations. Since the storage formats for sparse matrices include metadata describing the structure of non-zero elements within the matrix, we propose a generalized approach to compress metadata by exploiting substructures within the matrix. We call the proposed storage format Compressed Sparse eXtended (CSX). In our implementation we employ runtime code generation to construct specialized SpMV routines for each matrix. Experimental evaluation on two shared memory systems for 15 sparse matrices demonstrates significant performance gains as the number of participating cores increases. Regarding the cost of CSX construction, we propose several strategies which trade performance for preprocessing cost making CSX applicable both to online and offline preprocessing.

international parallel and distributed processing symposium | 2001

Minimizing completion time for loop tiling with computation and communication overlapping

Georgios I. Goumas; Aristidis Sotiropoulos; Nectarios Koziris

This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overall completion time. We select a time hyperplane to execute different tiles much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive, atomic tile executions. We assign tiles to processors according to the tile space boundaries thus considering the iteration space bounds. Our schedule considerably reduces overall completion time under the assumption that some part from every communication phase can be efficiently overlapped with atomic, pure tile computations. The overall schedule resembles a pipelined datapath where computations are not anymore interleaved with sends and receives to non-local processors. Experimental results in a cluster of Pentiums by using various MPI send primitives show that the total completion time is significantly reduced.

IEEE Transactions on Parallel and Distributed Systems | 2013

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

Vasileios Karakasis; Theodoros Gkountouvas; Kornilios Kourtis; Georgios I. Goumas; Nectarios Koziris

Sparse matrix-vector multiplication (SpM × V) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the SpM × V kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present a compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix. Relying on aggressive compression techniques of the sparse matrixs indexing structure, CSX is able to considerably reduce the memory footprint of a sparse matrix, alleviating the pressure to the memory subsystem. In a diverse set of sparse matrices, CSX was able to provide a more than 40 percent average performance improvement over the standard CSR format in SMP architectures and surpassed 20 percent improvement in NUMA systems, significantly outperforming other CSR alternatives. Additionally, it was able to adapt successfully to the nonzero element structure of the considered matrices, exhibiting very stable performance. Finally, in the context of a “real-life” multiphysics simulation software, CSX accelerated the SpM × V component nearly 40 percent and the total solver time approximately 15 percent.

international conference on parallel processing | 2008

Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression

Kornilios Kourtis; Georgios I. Goumas; Nectarios Koziris

The sparse matrix-vector multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth requirements. To decrease memory contention and improve the performance of the kernel we propose two compression schemes. The first, called CSR-DU, targets the reduction of the matrix structural data by applying coarse grain delta encoding for the column indices. The second scheme, called CSR-VI, targets the reduction of the numerical values using indirect indexing and can only be applied to matrices which contain a small number of unique values. Evaluation of both methods on a rich matrix set showed that they can significantly improve the performance of the multithreaded version of the kernel and achieve good scalability for large matrices.

computational science and engineering | 2009

A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures

Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

Sparse Matrix-Vector multiplication (SpMV) is a very challenging computationalkernel, since its performance depends greatly on both the input matrix and theunderlying architecture. The main problem of SpMV is its high demands on memorybandwidth, which cannot yet be abudantly offered from modern commodityarchitectures. One of the most promising optimization techniques for SpMV isblocking, which can reduce the indexing structures for storing a sparse matrix,and therefore alleviate the pressure to the memory subsystem. In this paper, westudy and evaluate a number of representative blocking storage formats on a setof modern microarchitectures that can provide up to 64 hardware contexts. Thepurpose of this paper is to present the merits and drawbacks of each method inrelation to the underlying microarchitecture and to provide a consistentoverview of the most promising blocking storage methods for sparse matrices thathave been presented in the literature.

international conference on parallel processing | 2009

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architecture. The main problem of SpMV is its high demands on memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One of the most promising optimization techniques for SpMV is blocking, which can reduce the indexing structures for storing a sparse matrix, and therefore alleviate the pressure to the memory subsystem. However, blocking methods can severely degrade performance if not used properly. In this paper, we study and evaluate a number of representative blocking storage formats and present a performance model that can accurately select the most suitable blocking storage format and the corresponding block shape and size for a specific sparse matrix. Our model considers both the memory and computational part of the kernel, which can be non-negligible when applying blocking, and also assumes an overlapping of memory accesses and computations that modern commodity architectures can offer through hardware prefetching mechanisms.

Explore More