Is this you? Create Your Porfile

Artur Mariano

Technische Universität Darmstadt

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Artur Mariano is active.

Explore More

Publication

Featured researches published by Artur Mariano.

international conference on parallel processing | 2015

Parallel (Probable) Lock-Free Hash Sieve: A Practical Sieving Algorithm for the SVP

Artur Mariano; Christian H. Bischof; Thijs Laarhoven

In this paper, we assess the practicability of Hash Sieve, a recently proposed sieving algorithm for the Shortest Vector Problem (SVP) on lattices, on multi-core shared memory systems. To this end, we devised a parallel implementation that scales well, and is based on a probable lock-free system to handle concurrency. The probable lock-free system, implemented with spin-locks, in turn implemented with CAS operations, becomes likely a lock-free mechanism, since threads block only when strictly required and chances are that they are not required to block. With our implementation, we were able to solve the SVP on an arbitrary lattice in dimension 96, in less than 17.5 hours, using 16 physical cores. The least squares fit of the execution times of our implementation, in seconds, lies between 2(0.32n -- 15) or 2(0.33n -- 16). These results are of paramount importance for the selection of parameters in lattice-based cryptography, as they indicate that sieving algorithms are way more practical for solving the SVP than previously believed.

symposium on computer architecture and high performance computing | 2014

Lock-Free GaussSieve for Linear Speedups in Parallel High Performance SVP Calculation

Artur Mariano; Shahar Timnat; Christian H. Bischof

Lattice-based cryptography became a hot-topic in the past years because it seems to be quantum immune, i.e., resistant to attacks operated with quantum computers. The security of lattice-based cryptosystems is determined by the hardness of certain lattice problems, such as the Shortest Vector Problem (SVP). Thus, it is of prime importance to study how efficiently SVP-solvers can be implemented. This paper presents a parallel shared-memory implementation of the GaussSieve algorithm, a well known SVP-solver. Our implementation achieves almost linear and linear speedups with up to 64 cores, depending on the tested scenario, and delivers better sequential performance than any other disclosed GaussSieve implementation. In this paper, we show that it is possible to implement a highly scalable version of GaussSieve on multi-core CPU-chips. The key features of our implementation are a lock-free singly linked list, and hand-tuned, vectorized code. Additionally, we propose an algorithmic optimization that leads to faster convergence.

international conference on progress in cryptology | 2014

Tuning GaussSieve for Speed

Robert Fitzpatrick; Christian H. Bischof; Johannes A. Buchmann; Özgür Dagdelen; Florian Göpfert; Artur Mariano; Bo-Yin Yang

The area of lattice-based cryptography is growing ever-more prominent as a paradigm for quantum-resistant cryptography. One of the most important hard problem underpinning the security of lattice-based cryptosystems is the shortest vector problem (SVP). At present, two approaches dominate methods for solving instances of this problem in practice: enumeration and sieving. In 2010, Micciancio and Voulgaris presented a heuristic member of the sieving family, known as GaussSieve, demonstrating it to be comparable to enumeration methods in practice. With contemporary lattice-based cryptographic proposals relying largely on the hardness of solving the shortest and closest vector problems in ideal lattices, examining possible improvements to sieving algorithms becomes highly pertinent since, at present, only sieving algorithms have been successfully adapted to solve such instances more efficiently than in the random lattice case. In this paper, we propose a number of heuristic improvements to GaussSieve, which can also be applied to other sieving algorithms for SVP.

european conference on parallel processing | 2014

A Comprehensive Empirical Comparison of Parallel ListSieve and GaussSieve

Artur Mariano; Özgür Dagdelen; Christian H. Bischof

The security of lattice-based cryptosystems is determined by the performance of practical implementations of, among others, algorithms for the Shortest Vector Problem SVP. In this paper, we conduct a comprehensive, empirical comparison of two SVP-solvers: ListSieve and GaussSieve. We also propose a practical parallel implementation of ListSieve, which achieves super-linear speedups on multi-core CPUs, with efficiency levels as high as 183%. By comparing our implementation with a parallel implementation of GaussSieve,i¾?we show that ListSieve can, in fact, outperform GaussSieve for a large number of threads, thus answering a question that was still open to thisi¾?day.

international embedded systems symposium | 2013

Hardware and Software Implementations of Prim’s Algorithm for Efficient Minimum Spanning Tree Computation

Artur Mariano; Dongwook Lee; Andreas Gerstlauer; Derek Chiou

Minimum spanning tree (MST) problems play an important role in many networking applications, such as routing and network planning. In many cases, such as wireless ad-hoc networks, this requires efficient high-performance and low-power implementations that can run at regular intervals in real time on embedded platforms. In this paper, we study custom software and hardware realizations of one common algorithm for MST computations, Prim’s algorithm. We specifically investigate a performance-optimized realization of this algorithm on reconfigurable hardware, which is increasingly present in such platforms.

parallel, distributed and network-based processing | 2016

Enhancing the Scalability and Memory Usage of Hashsieve on Multi-core CPUs

Artur Mariano; Christian H. Bischof

The Shortest Vector Problem (SVP) is a key problem in lattice-based cryptography and cryptanalysis. While the cryptography community has accumulated a vast knowledge of SVP-solvers from a theoretical standpoint, the practical performance of these algorithms is commonly not well understood. This gap in knowledge poses many challenges to cryptographers, who are oftentimes confronted with algorithms that perform worse in practice then expected from theory. This is a problem because the asymptotic complexity of the best algorithms plays a key role in the construction of cryptosystems, but only practically appealing, validated algorithms are accounted for in this process. Thus, if one cannot extract the full potential of theoretically strong algorithms in practice, efficient algorithms might be ruled out and wrong assumptions are made when constructing cryptosystems. In this paper, we take a step forward to fill this gap, by providing a computational analysis of HashSieve, the most practical sieving SVP-solver to date, and showing how its performance can be enhanced in practice. To this end, we revisit the parallel generation of random numbers, memory allocation and memory access patterns. Employing scalable random sampling, object memory pools, scalable memory allocators and aggressive memory prefetching, we were able to improve the best current implementation of HashSieve by factors of 3x and 4x, depending on the lattice dimension, and set new records for the HashSieve algorithm, thereby shrinking the gap between its theoretical complexity and its performance in practice.

parallel, distributed and network-based processing | 2015

A Generic and Highly Efficient Parallel Variant of Boruvka's Algorithm

Cristiano Da Silva Sousa; Artur Mariano; Alberto José Proença

This paper presents (i) a parallel, platform independent variant of Boruvkas algorithm, an efficient Minimum Spanning Tree (MST) solver, and (ii) a comprehensive comparison of MST-solver implementations, both on multi-core CPU-chips and GPUs. The core of our variant is an effective and explicit contraction of the graph. Our multi-core CPU implementation scales linearly up to 8 threads, whereas the GPU implementation performs considerably better than the optimal number of threads running on the CPU. We also show that our implementations outperform all other parallel MST-solver implementations in (ii), for a broad set of publicly available road network graphs.

parallel, distributed and network-based processing | 2017

A Parallel Variant of LDSieve for the SVP on Lattices

Artur Mariano; Thijs Laarhoven; Christian H. Bischof

In this paper, we propose a parallel implementation of LDSieve, a recently published sieving algorithm for the SVP, which achieves the best theoretical complexity to this day, on parallel shared-memory systems. In particular, we propose a scalable parallel variant of LDSieve that is probabilistically lock-free and relaxes the properties of the algorithm to favour parallelism. We use our parallel variant of LDSieve to answer a number of important questions pertaining to the algorithm. In particular, we show that LDSieve scales fairly well on shared-memory systems and uses much less memory than HashSieve on random lattices, for the same or even less execution time.

parallel, distributed and network-based processing | 2016

Parallel Improved Schnorr-Euchner Enumeration SE++ for the CVP and SVP

Fábio José Gonçalves Correia; Artur Mariano; Alberto José Proença; Christian H. Bischof; Erik Agrell

The Closest Vector Problem (CVP) and the Shortest Vector Problem (SVP) are prime problems in lattice-based cryptanalysis, since they underpin the security of many lattice-based cryptosystems. Despite the importance of these problems, there are only a few CVP-solvers publicly available, and their scalability was never studied. This paper presents a scalable implementation of an enumeration-based CVP-solver for multi-cores, which can be easily adapted to solve the SVP. In particular, it achieves super-linear speedups in some instances on up to 8 cores and almost linear speedups on 16 cores when solving the CVP on a 50-dimensional lattice. Our results show that enumeration-based CVP-solvers can be parallelized as effectively as enumeration-based solvers for the SVP, based on a comparison with a state of the art SVP-solver. In addition, we show that we can optimize the SVP variant of our solver in such a way that it becomes 35%-60% faster than the fastest enumeration-based SVP-solver to date.

parallel, distributed and network-based processing | 2016

Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines

Artur Mariano; Matthias Diener; Christian H. Bischof; Philippe Olivier Alexandre Navaux

Improving the memory access behavior of parallel applications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.

Explore More