Philippas Tsigas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philippas Tsigas is active.

Explore More

Publication

Featured researches published by Philippas Tsigas.

ACM Journal of Experimental Algorithms | 2009

GPU-Quicksort: A practical Quicksort algorithm for graphics processors

Daniel Cederman; Philippas Tsigas

In this article, we describe GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multicore graphics processors. Quicksort has previously been considered an inefficient sorting solution for graphics processors, but we show that in CUDA, NVIDIAs programing platform for general-purpose computations on graphical processors, GPU-Quicksort performs better than the fastest-known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.

international conference on computer graphics and interactive techniques | 2008

On dynamic load balancing on graphics processors

Daniel Cederman; Philippas Tsigas

To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically during execution. With the recent advent of scatter operations and atomic hardware primitives it is now possible to bring some of the more elaborate dynamic load balancing schemes from the conventional SMP systems domain to the graphics processor domain. We have compared four different dynamic load balancing methods to see which one is most suited to the highly parallel world of graphics processors. Three of these methods were lock-free and one was lock-based. We evaluated them on the task of creating an octree partitioning of a set of particles. The experiments showed that synchronization can be very expensive and that new methods that take more advantage of the graphics processors features and capabilities might be required. They also showed that lock-free methods achieves better performance than blocking and that they can be made to scale with increased numbers of processing units.

acm symposium on parallel algorithms and architectures | 2001

A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Philippas Tsigas; Yi Zhang

A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 — a symmetric multiprocessorsystem — and on a 64-node SGI Origin 2000 — a cache coherent non uniform memory access multiprocessorsystem — indicate that our algorithm considerably outperforms the best of the known alternatives in both multiprocessors in any level of multiprogramming. This work introduces two new, simple algorithmic mechanisms. The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address. In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern multiprocessors or can be implemented efficiently on them.

european symposium on algorithms | 2008

A Practical Quicksort Algorithm for Graphics Processors

Daniel Cederman; Philippas Tsigas

In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered as an inefficient sorting solution for graphics processors, but we show that GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.

visual analytics science and technology | 2007

DataMeadow: A Visual Canvas for Analysis of Large-Scale Multivariate Data

Niklas Elmqvist; John T. Stasko; Philippas Tsigas

Supporting visual analytics of multiple large-scale multidimensional datasets requires a high degree of interactivity and user control beyond the conventional challenges of visualizing such datasets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a dataset displayed as multivariate visualizations with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by iteratively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to outsiders. Towards this end, the DataMeadow has a direct manipulation interface for selection, filtering, and creation of sets, subsets, and data dependencies using both simple and complex mouse gestures. We have evaluated our system using a qualitative expert review involving two researchers working in the area. Results from this review are favorable for our new method.

nordic conference on secure it systems | 2009

ContikiSec: A Secure Network Layer for Wireless Sensor Networks under the Contiki Operating System

Lander Casado; Philippas Tsigas

In this paper we introduce ContikiSec, a secure network layer for wireless sensor networks, designed for the Contiki Operating System. ContikiSec has a configurable design, providing three security modes starting from confidentiality and integrity, and expanding to confidentiality, authentication, and integrity. ContikiSec has been designed to balance low energy consumption and security while conforming to a small memory footprint. Our design was based on performance evaluation of existing security primitives and is part of the contribution of this paper. Our evaluation was performed in the Modular Sensor Board hardware platform for wireless sensor networks, running Contiki. Contiki is an open source, highly portable operating system for wireless sensor networks (WSN) that is widely used in WSNs.

international symposium on microarchitecture | 2011

PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

Siegfried Benkner; Sabri Pllana; Jesper Larsson Träff; Philippas Tsigas; Uwe Dolinsky; Cédric Augonnet; Beverly Bachmayer; Christoph W. Kessler; David Moloney; Vitaly Osipov

PEPPHER, a three-year European FP7 project, addresses efficient utilization of hybrid (heterogeneous) computer systems consisting of multicore CPUs with GPU-type accelerators. This article outlines the PEPPHER performance-aware component model, performance prediction means, runtime system, and other aspects of the project. A larger example demonstrates performance portability with the PEPPHER approach across hybrid systems with one to four GPUs.

parallel, distributed and network-based processing | 2003

A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000

Philippas Tsigas; Yi Zhang

We have implemented sample sort and a parallel version of Quicksort on a cache-coherent shared address space multiprocessor: the SUN ENTERPRISE 10000. Our computational experiments show that parallel Quicksort outperforms sample sort. Sample sort has been long thought to be the best, general parallel sorting algorithm, especially for larger data sets. On 32 processors of the ENTERPRISE 10000 the speedup of parallel Quicksort is more than six units higher than the speedup of sample sort, resulting in execution times that were more than 50% faster than sample sort. On one processor, parallel quicksort achieved 15% percent faster execution times than sample sorting. Moreover, because of its low memory requirements, parallel Quicksort could sort data sets at twice the size that sample sort could under the same system memory restrictions.

IEEE Transactions on Parallel and Distributed Systems | 1994

Reading many variables in one atomic operation: solutions with linear or sublinear complexity

Lefteris M. Kirousis; Paul G. Spirakis; Philippas Tsigas

We address the problem of reading several variables (components) X/sub 1/,...,X/sub c/, all in one atomic operation, by only one process, called the reader, while each of these variables are being written by a set of writers. All operations (i.e., both reads and writes) are assumed to be totally asynchronous and wait-free. For this problem, only algorithms that require at best quadratic time and space complexity can be derived from the existing literature. (The time complexity of a construction is the number of suboperations of a high-level operation and its space complexity is the number of atomic shared variables it needs) In this paper, we provide a deterministic protocol that has linear (in the number of processes) space complexity, linear time complexity for a read operation, and constant time complexity for a write. Our solution does not make use of time-stamps. Rather, it is the memory location where a write writes that differentiates it from the other writes. Also, introducing randomness in the location where the reader gets the value that it returns, we get a conceptually very simple probabilistic algorithm. This algorithm has an overwhelmingly small, controllable probability of error. Its space complexity, and also the time complexity of a read operation, are sublinear. The time complexity of a write is constant. On the other hand, under the Archimedean time assumption, we get a protocol whose time and space complexity do not depend on the number of writers, but are linear in the number of components only. (The time complexity of a write operation is still constant.). >

acm symposium on applied computing | 2004

Scalable and lock-free concurrent dictionaries

Håkan Sundell; Philippas Tsigas

We present an efficient and practical lock-free implementation of a concurrent dictionary that is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. Many algorithms for concurrent dictionaries are based on mutual exclusion. However, mutual exclusion causes blocking which has several drawbacks and degrades the systems overall performance. Non-blocking algorithms avoid blocking, and are either lockfree or wait-free. Our algorithm is based on the randomized sequential list structure called Skiplist, and implements the full set of operations on a dictionary that is suitable for practical settings. In our performance evaluation we compare our algorithm with the most efficient non-blocking implementation of dictionaries known. The experimental results clearly show that our algorithm outperforms the other lockfree algorithm for dictionaries with realistic sizes, both on fully concurrent as well as pre-emptive systems.

Explore More