Is this you? Create Your Porfile

Håkan Sundell

Chalmers University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Håkan Sundell is active.

Explore More

Publication

Featured researches published by Håkan Sundell.

acm symposium on applied computing | 2004

Scalable and lock-free concurrent dictionaries

Håkan Sundell; Philippas Tsigas

We present an efficient and practical lock-free implementation of a concurrent dictionary that is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. Many algorithms for concurrent dictionaries are based on mutual exclusion. However, mutual exclusion causes blocking which has several drawbacks and degrades the systems overall performance. Non-blocking algorithms avoid blocking, and are either lockfree or wait-free. Our algorithm is based on the randomized sequential list structure called Skiplist, and implements the full set of operations on a dictionary that is suitable for practical settings. In our performance evaluation we compare our algorithm with the most efficient non-blocking implementation of dictionaries known. The experimental results clearly show that our algorithm outperforms the other lockfree algorithm for dictionaries with realistic sizes, both on fully concurrent as well as pre-emptive systems.

Journal of Parallel and Distributed Computing | 2008

Lock-free deques and doubly linked lists

Håkan Sundell; Philippas Tsigas

We present a practical lock-free shared data structure that efficiently implements the operations of a concurrent deque as well as a general doubly linked list. The implementation supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of doubly linked lists are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm only requires single-word compare-and-swap atomic primitives, supports fully dynamic list sizes, and allows traversal also through deleted nodes and thus avoids unnecessary operation retries. We have performed an empirical study of our new algorithm on two different multiprocessor platforms. Results of the experiments performed under high contention show that the performance of our implementation scales linearly with increasing number of processors. Considering deque implementations and systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture.

international parallel and distributed processing symposium | 2003

Fast and lock-free concurrent priority queues for multi-thread systems

Håkan Sundell; Philippas Tsigas

We present an efficient and practical lock-free implementation of a concurrent priority queue that is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. Many algorithms for concurrent priority queues are based on mutual exclusion. However, mutual exclusion causes blocking which has several drawbacks and degrades the systems overall performance. Non-blocking algorithms avoid blocking, and are either lock-free or wait-free. Previously known non-blocking algorithms of priority queues did not perform well in practice because of their complexity, and they are often based on non-available atomic synchronization primitives. Our algorithm is based on the randomized sequential list structure called Skiplist, and a real-time extension of our algorithm is also described. In our performance evaluation we compare our algorithm with some of the most efficient implementations of priority queues known. The experimental results clearly show that our lock-free implementation outperforms the other lock-based implementations in all cases for 3 threads and more, both on fully concurrent as well as on pre-emptive systems.

IEEE Transactions on Parallel and Distributed Systems | 2009

Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting

Anders Gidenstam; Marina Papatriantafilou; Håkan Sundell; Philippas Tsigas

We present an efficient and practical lock-free method for semiautomatic (application-guided) memory reclamation based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The method guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses atomic primitives that are available in modern computer systems, and provides an upper bound on the amount of memory waiting to be reclaimed. To the best of our knowledge, this is the first lock-free method that provides all of these properties. We provide analytical and experimental study of the method. The experiments conducted have shown that the method can also provide significant performance improvements for lock-free algorithms of dynamic data structures that require strong memory management.

international conference on principles of distributed systems | 2010

Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency

Anders Gidenstam; Håkan Sundell; Philippas Tsigas

A lock-free FIFO queue data structure is presented in this paper. The algorithm supports multiple producers and multiple consumers and weak memory models. It has been designed to be cache-aware and work directly on weak memory models. It utilizes the cache behavior in concert with lazy updates of shared data, and a dynamic lock-free memory management scheme to decrease unnecessary synchronization and increase performance. Experiments on an 8- way multi-core platform show significantly better performance for the new algorithm compared to previous fast lock-free algorithms.

acm symposium on parallel algorithms and architectures | 2011

A lock-free algorithm for concurrent bags

Håkan Sundell; Anders Gidenstam; Marina Papatriantafilou; Philippas Tsigas

A lock-free bag data structure supporting unordered buffering is presented in this paper. The algorithm supports multiple producers and multiple consumers, as well as dynamic collection sizes. To handle concurrency efficiently, the algorithm was designed to thrive for disjoint-access-parallelism for the supported semantics. Therefore, the algorithm exploits a distributed design combined with novel techniques for handling concurrent modifications of linked lists using double marks, detection of total emptiness, and efficient memory management with hazard pointer handover. Experiments on a 24-way multi-core platform show significantly better performance for the new algorithm compared to previous algorithms of relevance.

international symposium on parallel architectures algorithms and networks | 2005

Efficient and reliable lock-free memory reclamation based on reference counting

Anders Gidenstam; Marina Papatriantafilou; Håkan Sundell; Philippas Tsigas

ACM Sigarch Computer Architecture News | 2008

NOBLE: non-blocking programming support via lock-free shared abstract data types

Håkan Sundell; Philippas Tsigas

An essential part of programming for multi-core and multi-processor includes ef cient and reliable means for sharing data. Lock-free data structures are known as very suitable for this purpose, although experienced to be very complex to design. In this paper, we present a software library of non-blocking abstract data types that have been designed to facilitate lock-free programming for non-experts. The system provides: i) ef cient implementations of the most commonly used data types in concurrent and sequential software design, ii) a lock-free memory management system, and iii) a run time-system. The library provides clear semantics that are at least as strong as those of corresponding lock-based implementations of the respective data types. Our software library can be used for facilitating lockfree programming; its design enables the programmer to: i) replace lock-based components of sequential or parallel code easily and ef ciently , ii) use well-tuned concurrent algorithms inside a software or hardware transactional system. In the paper we describe the design and functionality of the system. We also provide experimental results that show that the library can considerably improve the performance of software systems.

international parallel and distributed processing symposium | 2014

gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles

Karl Jansson; Håkan Sundell; Henrik Boström

We present two new parallel implementations of the ensemble learning methods Random Forests (RF) and Extremely Randomized Trees (ERT), called gpuRF and gpuERT, for emerging many-core platforms, e.g., contemporary graphics cards suitable for general-purpose computing (GPGPU). RF and ERT are two ensemble methods for generating predictive models that are of high importance within machine learning. They operate by constructing a multitude of decision trees at training time and outputting a prediction by comparing the outputs of the individual trees. Thanks to the inherent parallelism of the task, an obvious platform for its computation is to employ contemporary GPUs with a large number of processing cores. Previous parallel algorithms for RF in the literature are either designed for traditional multi-core CPU platforms or early history GPUs with simpler architecture and relatively few cores. For ERT, only briefly sketched parallelization attempts exist in the literature. The new parallel algorithms are designed for contemporary GPUs with a large number of cores and take into account aspects of the newer hardware architectures, such as memory hierarchy and thread scheduling. They are implemented using the C/C++ language and the CUDA interface to attain the best possible performance on NVidia-based GPUs. An experimental study comparing the most important previous solutions for CPU and GPU platforms to the novel implementations shows significant advantages in the aspect of efficiency for the latter, often with several orders of magnitude.

Proceedings of the 6th Workshop on Languages Compilers and Run-time Systems for Scalable Computers | 2002