Damian Dechev | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Damian Dechev is active.

Explore More

Publication

Featured researches published by Damian Dechev.

international conference on principles of distributed systems | 2006

Lock-free dynamically resizable arrays

Damian Dechev; Peter Pirkelbauer; Bjarne Stroustrup

We present a first lock-free design and implementation of a dynamically resizable array (vector). The most extensively used container in the C++ Standard Template Library (STL) is vector, offering a combination of dynamic memory management and constant-time random access. Our approach is based on a single 32-bit word atomic compare-and-swap (CAS) instruction. It provides a linearizable and highly parallelizable STL-like interface, lock-free memory allocation and management, and fast execution. Our current implementation is designed to be most efficient on multi-core architectures. Experiments on a dual-core Intel processor with shared L2 cache indicate that our lock-free vector outperforms its lock-based STL counterpart and the latest concurrent vector implementation provided by Intel by a large factor. The performance evaluation on a quad dual-core AMD system with non-shared L2 cache demonstrated timing results comparable to the best available lock-based techniques. The presented design implements the most common STL vectors interfaces, namely random access read and write, tail insertion and deletion, pre-allocation of memory, and query of the containers size. Using the current implementation, a user has to avoid one particular ABA problem.

international symposium on object/component/service-oriented real-time distributed computing | 2010

Understanding and Effectively Preventing the ABA Problem in Descriptor-Based Lock-Free Designs

Damian Dechev; Peter Pirkelbauer; Bjarne Stroustrup

An increasing number of modern real-time systems and the nowadays ubiquitous multicore architectures demand the application of programming techniques for reliable and efficient concurrent synchronization. Some recently developed Compare-And-Swap (CAS) based nonblocking techniques hold the promise of delivering practical and safer concurrency. The ABA problem is a fundamental problem to many CAS-based designs. Its significance has increased with the suggested use of CAS as a core atomic primitive for the implementation of portable lock-free algorithms. The ABA problems occurrence is due to the intricate and complex interactions of the applications concurrent operations and, if not remedied, ABA can significantly corrupt the semantics of a nonblocking algorithm. The current state of the art leaves the elimination of the ABA hazards to the ingenuity of the software designer. In this work we provide the first systematic and detailed analysis of the ABA problem in lock-free Descriptor-based designs. We study the semantics of Descriptor-based lock-free data structures and propose a classification of their operations that helps us better understand the ABA problem and subsequently derive an effective ABA prevention scheme. Our ABA prevention approach outperforms by a large factor the use of the alternative CAS-based ABA prevention schemes. It offers speeds comparable to the use of the architecture-specific CAS2 instruction used for version counting. We demonstrate our ABA prevention scheme by integrating it into an advanced nonblocking data structure, a lock-free dynamically resizable array.

conference on current trends in theory and practice of informatics | 2009

Source Code Rejuvenation Is Not Refactoring

Peter Pirkelbauer; Damian Dechev; Bjarne Stroustrup

Programmers rely on programming idioms, design patterns, and workaround techniques to make up for missing programming language support. Evolving languages often address frequently encountered problems by adding language and library support to subsequent releases. By using new features, programmers can express their intent more directly. As new concerns, such as parallelism or security, arise, early idioms and language facilities can become serious liabilities. Modern code sometimes benefits from optimization techniques not feasible for code that uses less expressive constructs. Manual source code migration is expensive, time-consuming, and prone to errors. In this paper, we present the notion of source code rejuvenation, the automated migration of legacy code and very briefly mention the tools we use to achieve that. While refactoring improves structurally inadequate source code, source code rejuvenation leverages enhanced program language and library facilities by finding and replacing coding patterns that can be expressed through higher-level software abstractions. Raising the level of abstraction benefits software maintainability, security, and performance.

international conference on embedded computer systems architectures modeling and simulation | 2013

Concurrent multi-level arrays: Wait-free extensible hash maps

Steven D. Feldman; Pierre LaBorde; Damian Dechev

In this work we present the first design and implementation of a wait-free hash map. Our multiprocessor data structure allows a large number of threads to concurrently put, get, and remove information. Wait-freedom means that all threads make progress in a finite amount of time - an attribute that can be critical in real-time environments. This is opposed to the traditional blocking implementations of shared data structures which suffer from the negative impact of deadlock and related correctness and performance issues. Our design is portable because we only use atomic operations that are provided by the hardware; therefore, our hash map can be utilized by a variety of data-intensive applications including those within the domains of embedded systems and supercomputers. The challenges of providing this guarantee make the design and implementation of wait-free objects difficult. As such, there are few wait-free data structures described in the literature; in particular, there are no wait-free hash maps. It often becomes necessary to sacrifice performance in order to achieve wait-freedom. However, our experimental evaluation shows that our hash map design is, on average, 5 times faster than a traditional blocking design. Our solution outperforms the best available alternative non-blocking designs in a large majority of cases, typically by a factor of 8 or higher.

conference on object-oriented programming systems, languages, and applications | 2009

Scalable nonblocking concurrent objects for mission critical code

Damian Dechev; Bjarne Stroustrup

The high degree of complexity and autonomy of future robotic space missions, such as Mars Science Laboratory (MSL), poses serious challenges in assuring their reliability and efficiency. Providing fast and safe concurrent synchronization is of critical importance to such autonomous embedded software systems. The application of nonblocking synchronization is known to help eliminate the hazards of deadlock, livelock, and priority inversion. The nonblocking programming techniques are notoriously difficult to implement and offer a variety of semantic guarantees and usability and performance trade-offs. The present software development and certification methodologies applied at NASA do not reach the level of detail of providing guidelines for the design of concurrent software. The complex task of engineering reliable and efficient concurrent synchronization is left to the programmers ingenuity. A number of Software Transactional Memory (STM) approaches gained wide popularity because of their easy to apply interfaces, but currently fail to offer scalable nonblocking transactions. In this work we provide an in-depth analysis of the nonblocking synchronization semantics and their applicability in mission critical code. We describe a generic implementation of a methodology for scalable implementation of concurrent objects. Our performance evaluation demonstrates that our approach is practical and outperforms the application of nonblocking transactions by a large factor. In addition, we apply our Descriptor-based approach to provide a solution to the fundamental ABA problem. Our ABA prevention scheme, called the lambda-delta approach, outperforms by a large factor the use of garbage collection for the safe management of each shared location. It offers speeds comparable to the application of the architecture-specific CAS2 instruction used for version counting. The lambda-delta approach is an ABA prevention technique based on classification of concurrent operations and 3-step execution of a Descriptor object. A practical alternative to the application of CAS2 is particularly important for the engineering of embedded systems.

autonomic computing and communication systems | 2008

Verification and semantic parallelization of goal-driven autonomous software

Damian Dechev; Nicolas Rouquette; Peter Pirkelbauer; Bjarne Stroustrup

Future space missions such as the Mars Science Laboratory demand the engineering of some of the most complex man-rated autonomous software systems. According to some recent estimates, the certification cost for mission-critical software exceeds its development cost. The current process-oriented methodologies do not reach the level of detail of providing guidelines for the development and validation of concurrent software. Time and concurrency are the most critical notions in an autonomous space system. In this work we present the design and implementation of a first concurrency and time centered framework for verification and semantic parallelization of real-time C++ within the JPL Mission Data System Framework (MDS). The end goal of the industrial project that motivated our work is to provide certification artifacts and accelerated testing of the complex software interactions in autonomous flight systems. As a case study we demonstrate the verification and semantic parallelization of the MDS Goal Networks.

IEEE Transactions on Parallel and Distributed Systems | 2016

A Lock-Free Priority Queue Design Based on Multi-Dimensional Linked Lists

Deli Zhang; Damian Dechev

The throughput of concurrent priority queues is pivotal to multiprocessor applications such as discrete event simulation, best-first search and task scheduling. Existing lock-free priority queues are mostly based on skiplists, which probabilistically create shortcuts in an ordered list for fast insertion of elements. The use of skiplists eliminates the need of global rebalancing in balanced search trees and ensures logarithmic sequential search time on average, but the worst-case performance is linear with respect to the input size. In this paper, we propose a quiescently consistent lock-free priority queue based on a multi-dimensional list that guarantees worst-case search time of O(logN) for key universe of size N. The novel multi-dimensional list (MDList) is composed of nodes that contain multiple links to child nodes arranged by their dimensionality. The insertion operation works by first injectively mapping the scalar key to a high-dimensional vector, then uniquely locating the target position by using the vector as coordinates. Nodes in MDList are ordered by their coordinate prefixes and the ordering property of the data structure is readily maintained during insertion without rebalancing nor randomization. In our experimental evaluation using a micro-benchmark, our priority queue achieves an average of 50 percent speedup over the state of the art approaches under high concurrency.

software language engineering | 2010

Support for the evolution of C++ generic functions

Peter Pirkelbauer; Damian Dechev; Bjarne Stroustrup

The choice of requirements for an argument of a generic type or algorithm is a central design issue in generic programming. In the context of C++, a specification of requirements for a template argument or a set of template arguments is called a concept. In this paper, we present a novel tool, TACE (template analysis and concept extraction), designed to help programmers understand the requirements that their code de facto imposes on arguments and help simplify and generalize those through comparisons with libraries of well-defined and precisely-specified concepts. TACE automatically extracts requirements from the body of function templates. These requirements are expressed using the notation and semantics developed by the ISO C++ standards committee. TACE converts implied requirements into concept definitions and compares them against concepts from a repository. Components of a well-defined library exhibit commonalities that allow us to detect problems by comparing requirements from many components: Design and implementation problems manifest themselves as minor variations in requirements. TACE points to source code that cannot be constrained by concepts and to code where small modifications would allow the use of less constraining concepts. For people who use a version of C++ with concept support, TACE can serve as a core engine for automated source code rejuvenation.

IEEE Transactions on Parallel and Distributed Systems | 2016

An Efficient Wait-Free Vector

Steven D. Feldman; Carlos Valera-Leon; Damian Dechev

The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector, and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intels TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.

southeastcon | 2015

Designing energy-efficient approximate adders using parallel genetic algorithms

Adnan Aquib Naseer; Rizwan A. Ashraf; Damian Dechev; Ronald F. DeMara

Approximate computing involves selectively reducing the number of transistors in a circuit to improve energy savings. Energy savings may be achieved at the cost of reduced accuracy for signal processing applications whereby constituent adder and multiplier circuits need not generate a precise output. Since the performance versus energy savings landscape is complex, we investigate the acceleration of the design of approximate adders using parallelized Genetic Algorithms (GAs). The fitness evaluation of each approximate adder is explored by the GA in a non-sequential fashion to automatically generate novel approximate designs within specified performance thresholds. This paper advances methods of parallelizing GAs and implements a new parallel GA approach for approximate multi-bit adder design. A speedup of approximately 1.6-fold is achieved using a quad-core Intel processor and results indicate that the proposed GA is able to find adders which consume 63:8% less energy than accurate adders.

Explore More