Carl Leonardsson
Uppsala University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carl Leonardsson.
tools and algorithms for construction and analysis of systems | 2015
Parosh Aziz Abdulla; Stavros Aronis; Mohamed Faouzi Atig; Bengt Jonsson; Carl Leonardsson; Konstantinos F. Sagonas
We present a technique for efficient stateless model checking of programs that execute under the relaxed memory models TSO and PSO. The basis for our technique is a novel representation of executions under TSO and PSO, called chronological traces. Chronological traces induce a partial order relation on relaxed memory executions, capturing dependencies that are needed to represent the interaction via shared variables. They are optimal in the sense that they only distinguish computations that are inequivalent under the widely-used representation by Shasha and Snir. This allows an optimal dynamic partial order reduction algorithm to explore a minimal number of executions while still guaranteeing full coverage. We apply our techniques to check, under the TSO and PSO memory models, LLVM assembly produced for C/pthreads programs. Our experiments show that our technique reduces the verification effort for relaxed memory models to be almost that for the standard model of sequential consistency. In many cases, our implementation significantly outperforms other comparable tools.
Acta Informatica | 2017
Parosh Aziz Abdulla; Stavros Aronis; Mohamed Faouzi Atig; Bengt Jonsson; Carl Leonardsson; Konstantinos F. Sagonas
We present a technique for efficient stateless model checking of programs that execute under the relaxed memory models TSO and PSO. The basis for our technique is a novel representation of executions under TSO and PSO, called chronological traces. Chronological traces induce a partial order relation on relaxed memory executions, capturing dependencies that are needed to represent the interaction via shared variables. They are optimal in the sense that they only distinguish computations that are inequivalent under the widely-used representation by Shasha and Snir. This allows an optimal dynamic partial order reduction algorithm to explore a minimal number of executions while still guaranteeing full coverage. We apply our techniques to check, under the TSO and PSO memory models, LLVM assembly produced for C/pthreads programs. Our experiments show that our technique reduces the verification effort for relaxed memory models to be almost that for the standard model of sequential consistency. This article is an extended version of Abdulla et al. (Tools and algorithms for the construction and analysis of systems, Springer, New York, pp 353–367, 2015), appearing in TACAS 2015.
tools and algorithms for construction and analysis of systems | 2013
Parosh Aziz Abdulla; Mohamed Faouzi Atig; Yu-Fang Chen; Carl Leonardsson; Ahmed Rezine
We introduce Memorax, a tool for the verification of control state reachability (i.e., safety properties) of concurrent programs manipulating finite range and integer variables and running on top of weak memory models. The verification task is non-trivial as it involves exploring state spaces of arbitrary or even infinite sizes. Even for programs that only manipulate finite range variables, the sizes of the store buffers could grow unboundedly, and hence the state spaces that need to be explored could be of infinite size. In addition, Memorax incorporates an interpolation based CEGAR loop to make possible the verification of control state reachability for concurrent programs involving integer variables. The reachability procedure is used to automatically compute possible memory fence placements that guarantee the unreachability of bad control states under TSO. In fact, for programs only involving finite range variables and running on TSO, the fence insertion functionality is complete, i.e., it will find all minimal sets of memory fence placements (minimal in the sense that removing any fence would result in the reachability of the bad control states). This makes Memorax the first freely available, open source, push-button verification and fence insertion tool for programs running under TSO with integer variables.
static analysis symposium | 2012
Parosh Aziz Abdulla; Mohamed Faouzi Atig; Yu-Fang Chen; Carl Leonardsson; Ahmed Rezine
We propose an automatic fence insertion and verification framework for concurrent programs running under relaxed memory. Unlike previous approaches to this problem, which allow only variables of finite domain, we target programs with (unbounded) integer variables. The problem is difficult because it has two different sources of infiniteness: unbounded store buffers and unbounded integer variables. Our framework consists of three main components: (1) a finite abstraction technique for the store buffers, (2) a finite abstraction technique for the integer variables, and (3) a counterexample guided abstraction refinement loop of the model obtained from the combination of the two abstraction techniques. We have implemented a prototype based on the framework and run it successfully on all standard benchmarks together with several challenging examples that are beyond the applicability of existing methods.
computer aided verification | 2016
Parosh Aziz Abdulla; Mohamed Faouzi Atig; Bengt Jonsson; Carl Leonardsson
We present the first framework for efficient application of stateless model checking (SMC) to programs running under the relaxed memory model of POWER. The framework combines several contributions. The first contribution is that we develop a scheme for systematically deriving operational execution models from existing axiomatic ones. The scheme is such that the derived execution models are well suited for efficient SMC. We apply our scheme to the axiomatic model of POWER from [8]. Our main contribution is a technique for efficient SMC, called Relaxed Stateless Model Checking (RSMC), which systematically explores the possible inequivalent executions of a program. RSMC is suitable for execution models obtained using our scheme. We prove that RSMC is sound and optimal for the POWER memory model, in the sense that each complete program behavior is explored exactly once. We show the feasibility of our technique by providing an implementation for programs written in C/pthreads.
international symposium on performance analysis of systems and software | 2016
Christos Sakalis; Carl Leonardsson; Stefanos Kaxiras; Alberto Ros
Benchmarks are indispensable in evaluating the performance implications of new research ideas. However, their usefulness is compromised if they do not work correctly on a system under evaluation or, in general, if they cannot be used consistently to compare different systems. A well-known benchmark suite of parallel applications is the Splash-2 suite. Since its creation in the context of the DASH project, Splash-2 benchmarks have been widely used in research. However, Splash-2 was released over two decades ago and does not adhere to the recent C memory consistency model. This leads to unexpected and often incorrect behavior when some Splash-2 benchmarks are used in conjunction with contemporary compilers and hardware (simulated or real). Most importantly, we discovered critical performance bugs that may question some of the reported benchmark results. In this work, we analyze the Splash-2 benchmarks and expose data races and related performance bugs. We rectify the problematic benchmarks and evaluate the resulting performance. Our work contributes to the community a new sanitized version of the Splash-2 benchmarks, called the Splash-3 benchmark suite.
international conference on parallel architectures and compilation techniques | 2016
Alberto Ros; Carl Leonardsson; Christos Sakalis; Stefanos Kaxiras
Cache coherence protocols based on self-invalidation allow simpler hardware implementation compared to traditional write-invalidation protocols, by relying on data-race-free semantics and applying self-invalidation on synchronization points. Their simplicity lies in the absence of invalidation traffic. This eliminates the need to track readers in a directory, and reduces the number of transient protocol states. Similarly, the use of self-downgrade on synchronization eliminates directory indirection, and hence the need to track writers in a directory. These protocols, effectively without a directory, have the potential to reduce area, energy consumption, and complexity, without sacrificing performance—provided, that self-invalidation and self-downgrade are performed prudently. In this work we examine how self-invalidation and self-downgrade are performed in relation to atomicity and ordering. We show that self-invalidation and self-downgrade do not need to be applied conservatively, as so far implemented. Our key observation is that, often, critical sections which are not ordered in time, are intended to provide only atomicity and not thread synchronization. We thus propose a new type of self-invalidation, forwardself-invalidation (FSI), which invalidates solely data that are going to be accessed inside a critical section. Based on the same reasoning, we propose a new type of self-downgrade, forward self-downgrade (FSD), also restricted to writes in critical sections. Finally, we define the semantics of locks using FSI and FSD, which resemble the semantics of relaxed atomic operations in C++. Our evaluation for 64-core multiprocessors shows significant improvements using the proposed FSI and FSD—where applicable—in Splash-3 and PARSEC benchmarks, over a directory-based protocol (17.1 percent in execution time and 33.9 percent in energy consumption) and also over a state-of-the-art self-invalidation/self-downgrade protocol (7.6 percent in execution time and 9.1 percent in energy consumption), while still retaining the design simplicity of the protocol.
IEEE Transactions on Parallel and Distributed Systems | 2017
Alberto Ros; Carl Leonardsson; Christos Sakalis; Stefanos Kaxiras
Cache coherence protocols based on self-invalidation allow simpler hardware implementation compared to traditional write-invalidation protocols, by relying on data-race-free semantics and applying self-invalidation on synchronization points. Their simplicity lies in the absence of invalidation traffic. This eliminates the need to track readers in a directory, and reduces the number of transient protocol states. Similarly, the use of self-downgrade on synchronization eliminates directory indirection, and hence the need to track writers in a directory. These protocols, effectively without a directory, have the potential to reduce area, energy consumption, and complexity, without sacrificing performance— provided , that self-invalidation and self-downgrade are performed prudently. In this work we examine how self-invalidation and self-downgrade are performed in relation to atomicity and ordering. We show that self-invalidation and self-downgrade do not need to be applied conservatively, as so far implemented. Our key observation is that, often, critical sections which are not ordered in time, are intended to provide only atomicity and not thread synchronization. We thus propose a new type of self-invalidation, forward self-invalidation (FSI), which invalidates solely data that are going to be accessed inside a critical section. Based on the same reasoning, we propose a new type of self-downgrade, forward self-downgrade (FSD), also restricted to writes in critical sections. Finally, we define the semantics of locks using FSI and FSD, which resemble the semantics of relaxed atomic operations in C++. Our evaluation for 64-core multiprocessors shows significant improvements using the proposed FSI and FSD—where applicable—in Splash-3 and PARSEC benchmarks, over a directory-based protocol (17.1 percent in execution time and 33.9 percent in energy consumption) and also over a state-of-the-art self-invalidation/self-downgrade protocol (7.6 percent in execution time and 9.1 percent in energy consumption), while still retaining the design simplicity of the protocol.
formal techniques for (networked and) distributed systems | 2016
Parosh Aziz Abdulla; Mohamed Faouzi Atig; Stefanos Kaxiras; Carl Leonardsson; Alberto Ros; Yunyun Zhu
Cache coherence protocols using self-invalidation and self-downgrade have recently seen increased popularity due to their simplicity, potential performance efficiency, and low energy consumption. However, such protocols result in memory instruction reordering, thus causing extra program behaviors that are often not intended by the programmer. We propose a novel formal model that captures the semantics of programs running under such protocols, and employs a set of fences that interact with the coherence layer. Using the model, we perfform a reachability analysis that can check whether a program satisfies a given safety property with the current set of fences. Based on an algorithm in [19], we describe a method for insertion of optimal sets of fences that ensure correctness of the program under such protocols. The method relies on a counter-example guided fence insertion procedure. One feature of our method is that it can handle a variety of fences with different costs. This diversity makes optimization more difficult since one has to optimize the total cost of the inserted fences, rather than just their number. To demonstrate the strength of our approach, we have implemented a prototype and run it on a wide range of examples and benchmarks. We have also, using simulation, evaluated the performance of the resulting fenced programs.
tools and algorithms for construction and analysis of systems | 2012
Parosh Aziz Abdulla; Mohamed Faouzi Atig; Yu-Fang Chen; Carl Leonardsson; Ahmed Rezine