Is this you? Create Your Porfile

Vasileios Karakasis

National and Kapodistrian University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vasileios Karakasis is active.

Explore More

Publication

Featured researches published by Vasileios Karakasis.

The Journal of Supercomputing | 2009

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

Georgios I. Goumas; Kornilios Kourtis; Nikos Anastopoulos; Vasileios Karakasis; Nectarios Koziris

In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture’s specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.

parallel, distributed and network-based processing | 2008

Understanding the Performance of Sparse Matrix-Vector Multiplication

Georgios I. Goumas; Kornilios Kourtis; Nikos Anastopoulos; Vasileios Karakasis; Nectarios Koziris

In this paper we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided and thus unsuccessful attempts for optimization. In order to gain an insight on the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. Based on our experiments we extract useful conclusions that can serve as guidelines for the subsequent optimization process of the kernel.

acm sigplan symposium on principles and practice of parallel programming | 2011

CSX: an extended compression format for spmv on shared memory systems

Kornilios Kourtis; Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve the kernels performance is to drastically reduce the data volume involved in the computations. Since the storage formats for sparse matrices include metadata describing the structure of non-zero elements within the matrix, we propose a generalized approach to compress metadata by exploiting substructures within the matrix. We call the proposed storage format Compressed Sparse eXtended (CSX). In our implementation we employ runtime code generation to construct specialized SpMV routines for each matrix. Experimental evaluation on two shared memory systems for 15 sparse matrices demonstrates significant performance gains as the number of participating cores increases. Regarding the cost of CSX construction, we propose several strategies which trade performance for preprocessing cost making CSX applicable both to online and offline preprocessing.

IEEE Transactions on Evolutionary Computation | 2008

Efficient Evolution of Accurate Classification Rules Using a Combination of Gene Expression Programming and Clonal Selection

Vasileios Karakasis; Andreas Stafylopatis

A hybrid evolutionary technique is proposed for data mining tasks, which combines a principle inspired by the immune system, namely the clonal selection principle, with a more common, though very efficient, evolutionary technique, gene expression programming (GEP). The clonal selection principle regulates the immune response in order to successfully recognize and confront any foreign antigen, and at the same time allows the amelioration of the immune response across successive appearances of the same antigen. On the other hand, gene expression programming is the descendant of genetic algorithms and genetic programming and eliminates their main disadvantages, such as the genotype-phenotype coincidence, though it preserves their advantageous features. In order to perform the data mining task, the proposed algorithm introduces the notion of a data class antigen, which is used to represent a class of data, the produced rules are evolved by our clonal selection algorithm (CSA), which extends the recently proposed CLONALG algorithm. In CSA, among other new features, a receptor editing step has been incorporated. Moreover, the rules themselves are represented as antibodies that are coded as GEP chromosomes in order to exploit the flexibility and the expressiveness of such encoding. The proposed hybrid technique is tested on a set of benchmark problems in comparison to GEP. In almost all problems considered, the results are very satisfactory and outperform conventional GEP both in terms of prediction accuracy and computational efficiency.

IEEE Transactions on Parallel and Distributed Systems | 2013

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

Vasileios Karakasis; Theodoros Gkountouvas; Kornilios Kourtis; Georgios I. Goumas; Nectarios Koziris

Sparse matrix-vector multiplication (SpM × V) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the SpM × V kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present a compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix. Relying on aggressive compression techniques of the sparse matrixs indexing structure, CSX is able to considerably reduce the memory footprint of a sparse matrix, alleviating the pressure to the memory subsystem. In a diverse set of sparse matrices, CSX was able to provide a more than 40 percent average performance improvement over the standard CSR format in SMP architectures and surpassed 20 percent improvement in NUMA systems, significantly outperforming other CSR alternatives. Additionally, it was able to adapt successfully to the nonzero element structure of the considered matrices, exhibiting very stable performance. Finally, in the context of a “real-life” multiphysics simulation software, CSX accelerated the SpM × V component nearly 40 percent and the total solver time approximately 15 percent.

computational science and engineering | 2009

A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures

Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

Sparse Matrix-Vector multiplication (SpMV) is a very challenging computationalkernel, since its performance depends greatly on both the input matrix and theunderlying architecture. The main problem of SpMV is its high demands on memorybandwidth, which cannot yet be abudantly offered from modern commodityarchitectures. One of the most promising optimization techniques for SpMV isblocking, which can reduce the indexing structures for storing a sparse matrix,and therefore alleviate the pressure to the memory subsystem. In this paper, westudy and evaluate a number of representative blocking storage formats on a setof modern microarchitectures that can provide up to 64 hardware contexts. Thepurpose of this paper is to present the merits and drawbacks of each method inrelation to the underlying microarchitecture and to provide a consistentoverview of the most promising blocking storage methods for sparse matrices thathave been presented in the literature.

international conference on parallel processing | 2009

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architecture. The main problem of SpMV is its high demands on memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One of the most promising optimization techniques for SpMV is blocking, which can reduce the indexing structures for storing a sparse matrix, and therefore alleviate the pressure to the memory subsystem. However, blocking methods can severely degrade performance if not used properly. In this paper, we study and evaluate a number of representative blocking storage formats and present a performance model that can accurately select the most suitable blocking storage format and the corresponding block shape and size for a specific sparse matrix. Our model considers both the memory and computational part of the kernel, which can be non-negligible when applying blocking, and also assumes an overlapping of memory accesses and computations that modern commodity architectures can offer through hardware prefetching mechanisms.

international parallel and distributed processing symposium | 2009

Exploring the effect of block shapes on the performance of sparse kernels

Vasileios Karakasis; Georgios I. Goumas; Nectarios Koziris

In this paper we explore the impact of the block shape on blocked and vectorized versions of the Sparse Matrix-Vector Multiplication (SpMV) kernel and build upon previous work by performing an extensive experimental evaluation of the most widespread blocking storage format, namely Block Compressed Sparse Row (BCSR) format, on a set of modern commodity microarchitectures. We evaluate the merit of vectorization on the memory-bound blocked SpMV kernel and report the results for single- and multithreaded (both SMP and NUMA) configurations. The performance of blocked SpMV can significantly vary with the block shape, despite similar memory bandwidth demands for different blocks. This is further accentuated when vectorizing the kernel. When moving to multiple cores, the memory wall problem becomes even more evident and may overwhelm any benefit from optimizations targeting the computational part of the kernel. In this paper we explore and discuss the architectural characteristics of modern commodity architectures that are responsible for these performance variations between block shapes.

international conference hybrid intelligent systems | 2008

Clonal Selection-Based Neural Classifier

Aris Lanaridis; Vasileios Karakasis; Andreas Stafylopatis

Artificial immune systems (AIS) constitute an emerging and promising field, and have been applied to pattern recognition and classification tasks to a limited extent so far. This work is a first attempt of applying the clonal selection principle to the training of multi-layer perceptrons (MLPs). The clonal selection based neural classifier (CSNC) uses the basic concepts of clonal selection to evolve MLPs, which are represented as real-valued linear antibodies. The proposed system is actually a multi-classifier, consisting of multiple sets of MLPs, each one devoted to the recognition of a different class of the input data. The final trained classifier is comprised of the best MLPs from each set. The proposed classifier is tested against a set of benchmark problems and yields promising results.

international parallel and distributed processing symposium | 2013

Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore

Theodoros Gkountouvas; Vasileios Karakasis; Kornilios Kourtis; Georgios I. Goumas; Nectarios Koziris

Symmetric sparse matrices arise often in the solution of sparse linear systems. Exploiting the non-zero element symmetry in order to reduce the overall matrix size is very tempting for optimizing the symmetric Sparse Matrix-Vector Multiplication kernel (SpMxV) for multicore architectures. Despite being very beneficial for the single-threaded execution, not storing the upper or lower triangular part of a symmetric sparse matrix complicates the multithreaded SpMxV version, since it introduces an undesirable dependency on the output vector elements. The most common approach for overcoming this problem is to use local, per-thread vectors, which are reduced to the output vector at the end of the computation. However, this reduction leads to considerable memory traffic, limiting the scalability of the symmetric SpMxV. In this paper, we take a two-step approach in optimizing the symmetric SpMxV kernel. First, we introduce the CSX-Sym variant of the highly compressed CSX format, which exploits the non-zero element symmetry for compressing further the input matrix. Second, we minimize the memory traffic produced by the local vectors reduction phase by implementing a non-zero indexing compression scheme that minimizes the local data to be reduced. Our indexing scheme allowed the scaling of symmetric SpMxV and provided a more than 2x performance improvement over the baseline CSR implementation and 83.9% over the typical symmetric SpMxV kernel. The CSX-Sym variant has further increased the symmetric SpMxV performance by 43.4%. Finally, we evaluate the effect of our optimizations in the context of the CG iterative method, where we achieve an 77.8% acceleration of the overall solver.

Explore More