Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Karl-Filip Faxén is active.

Publication


Featured researches published by Karl-Filip Faxén.


ACM Sigarch Computer Architecture News | 2008

Wool-A work stealing library

Karl-Filip Faxén

This paper presents some preliminary results on a small light weight user level task management library called Wool. The Wool task scheduler is based on work stealing. The objective of the library is to provide a reasonably convenient programming interface (in particular by not forcing the programmer to write in continuation passing style) in ordinary C while still having a very low task creation overhead. Several task scheduling systems based on work stealing exists, but they are typically either programming languages like Cilk-5 or based on C++ like the Intel TBB or C# as in the Microsoft TPL. Our main conclusions are that such a direct style interface is indeed possible and yields performance that is comparable to that of the Intel TBB.


international conference on parallel processing | 2010

Efficient Work Stealing for Fine Grained Parallelism

Karl-Filip Faxén

This paper deals with improving the performance of fine grain task parallelism. It is often either cumbersome or impossible to increase the grain size of such programs. Increasing core counts exacerbates the problem; a program that appears coarse-grained on eight cores may well look a lot more fine-grained on sixty four. In this paper we present the direct task stack, a novel work stealing algorithm with unusually low overheads, both for creating tasks and for stealing. We compare the performance of our scheduler to Cilk++, the icc implementation of OpenMP 3.0 and the Intel TBB library on an eight core, dual socket Opteron machine. We also analyze the reasons why our techniques achieve consistent speed ups over the other systems ranging from 2-3x on many fine grained workloads to over 50 in extreme cases and show quantitatively how each of the techniques we use contribute to the improved performance.


complex, intelligent and software intensive systems | 2008

Embla - Data Dependence Profiling for Parallel Programming

Karl-Filip Faxén; Konstantin Popov; Lars Albertsson; Sverker Janson

With the proliferation of multicore processors, there is an urgent need for tools and methodologies supporting parallelization of existing applications. In this paper, we present a novel tool for aiding programmers in parallelizing programs. The tool, Embla, is based on the Valgrind framework, and allows the user to discover the data dependences in a sequential program, thereby exposing opportunities for parallelization. Embla performs an off-line dynamic analysis, and records dependences as they arise during program execution. It reports an optimistic view of parallelizable sequences, and ignores dependences that do not arise during execution. Moreover, since the tool instruments the machine code of the program, it is largely language independent. Since Embla finds the dependencies that occur for particular executions, the confidence one would assign to its results depend on whether different executions yield different (bad) or largely the same (good) dependencies. We present a preliminary investigation into this issue using 84 different inputs to the SPEC CPU2006 benchmark 403.gcc. The results indicate that there is a strong correlation between coverage and finding dependencies; executing the entire program is likely to reveal all dependencies.


european conference on parallel processing | 2010

Estimating and exploiting potential parallelism by source-level dependence profiling

Jonathan Chee Heng Mak; Karl-Filip Faxén; Sverker Janson; Alan Mycroft

Manual parallelization of programs is known to be difficult and error-prone, and there are currently few ways to measure the amount of potential parallelism in the original sequential code. We present an extension of Embla, a Valgrind-based dependence profiler that links dynamic dependences back to source code. This new tool estimates potential task-level parallelism in a sequential program and helps programmers exploit it at the source level. Using the popular forkjoin model, our tool provides a realistic estimate of potential speed-up for parallelization with frameworks like Cilk, TBB or OpenMP 3.0. Estimates can be given for several different parallelization models, varying in programmer effort and capabilities required of the underlying implementation. Our tool also outputs source-level dependence information to aid the parallelization of programs with lots of inherent parallelism, as well as critical paths to suggest algorithmic rewrites of programs with little of it. We validate our claims by running our tool over serial elisions of sample Cilk programs, finding additional inherent parallelism not exploited by the Cilk code, as well as over serial C benchmarks where the profiling results suggest parallelism-enhancing algorithmic rewrites.


Concurrency and Computation: Practice and Experience | 2015

A comparative performance study of common and popular task-centric programming frameworks

Artur Podobas; Mats Brorsson; Karl-Filip Faxén

Programmers today face a bewildering array of parallel programming models and tools, making it difficult to choose an appropriate one for each application. An increasingly popular programming model supporting structured parallel programming patterns in a portable and composable manner is the task‐centric programming model. In this study, we compare several popular task‐centric programming frameworks, including Cilk Plus, Threading Building Blocks, and various implementations of OpenMP 3.0. We have analyzed their performance on the Barcelona OpenMP Tasking Suite benchmark suite both on a 48‐core AMD Opteron 6172 server and a 64‐core TILEPro64 embedded many‐core processor. Our results show that the OpenMP offers the highest flexibility for programmers, and this flexibility comes to a cost. Frameworks supporting only a specific and more restrictive model, such as Cilk Plus and Threading Building Blocks, are generally more efficient both in terms of performance and energy consumption. However, Intels implementation of OpenMP tasks performs the best and closest to the specialized run‐time systems. Copyright


computing frontiers | 2011

Manycore work stealing

Karl-Filip Faxén; John Ardelius

This paper investigates executing task based programs on a 64 core Tilera processor under the high performance work stealer Wool. We measure the performance of several programs from the BOTS benchmark suite, observing excellent scalability whenever sufficient parallelism exists. We also explore alternatives to random victim selection; we use sampling to try to find a large task to steal and set based stealing to improve cache and TLB locality.


3rd Workshop on Programmability Issues for Multi-Core Computers | 2010

A Comparison of some recent Task-based Parallel Programming Models

Artur Podobas; Mats Brorsson; Karl-Filip Faxén


Archive | 2008

Multicore computing--the state of the art

Karl-Filip Faxén; Christer Bengtsson; Mats Brorsson; Håkan Grahn; Erik Hagersten; Bengt Jonsson; Christoph W. Kessler; Björn Lisper; Per Stenström; Bertil Svensson


RESoLVE '12, Second workshop on Runtime Environments, Systems, Layering and Virtualized Environments, London UK, March 3, 2012. | 2012

Resource management for task-based parallel programs over a multi-kernel. : BIAS: Barrelfish Inter-core Adaptive Scheduling

Georgios Varisteas; Mats Brorsson; Karl-Filip Faxén


Archive | 2012

A Quantitative Evaluation of popular Task-Centric Programming Models and Libraries

Artur Podobas; Mats Brorsson; Karl-Filip Faxén

Collaboration


Dive into the Karl-Filip Faxén's collaboration.

Top Co-Authors

Avatar

Mats Brorsson

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Artur Podobas

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Björn Lisper

Mälardalen University College

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Georgios Varisteas

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Håkan Grahn

Blekinge Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Sverker Janson

Swedish Institute of Computer Science

View shared research outputs
Top Co-Authors

Avatar

John Ardelius

Swedish Institute of Computer Science

View shared research outputs
Researchain Logo
Decentralizing Knowledge