Osman S. Unsal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Osman S. Unsal is active.

Explore More

Publication

Featured researches published by Osman S. Unsal.

international symposium on microarchitecture | 2009

EazyHTM: eager-lazy hardware transactional memory

Sasa Tomic; Cristian Perfumo; Chinmay Eishan Kulkarni; Adrià Armejach; Adrián Cristal; Osman S. Unsal; Tim Harris; Mateo Valero

Transactional memory aims to provide a programming model that makes parallel programming easier. Hardware implementations of transactional memory (HTM) suffer from fewer overheads than implementations in software, and refinements in conflict management strategies for HTM allow for even larger improvements. In particular, lazy conflict management has been shown to deliver better performance, but it has hitherto required complex protocols and implementations. In this paper we show a new scalable HTM architecture that performs comparably to the state-of-the-art and can be implemented by minor modifications to the MESI protocol rather than re-engineering it from the ground up. Our approach detects conflicts eagerly while a transaction is running, but defers the resolution lazily until commit time. We evaluate this EAger-laZY system, EazyHTM, by comparing it with the scalable-TCC-like approach and a system employing ideal lazy conflict management with a zero-cycle transaction validation and fully-parallel commits. We show that EazyHTM performs on average 7% faster than scalable-TCC. In addition, EazyHTM has fast commits and aborts, can commit in parallel even if there is only one directory present, and does not suffer from cascading waits.

international conference on parallel architectures and compilation techniques | 2010

Discovering and understanding performance bottlenecks in transactional applications

Ferad Zyulkyarov; Srdjan Stipic; Tim Harris; Osman S. Unsal; Adrian Cristal; Ibrahim Hur; Mateo Valero

Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and tuning programs which use transactions.

international parallel and distributed processing symposium | 2009

Taking the heat off transactions: Dynamic selection of pessimistic concurrency control

Nehir Sonmez; Tim Harris; Adrián Cristal; Osman S. Unsal; Mateo Valero

In this paper we investigate feedback-directed dynamic selection between different implementations of atomic blocks. We initially execute atomic blocks using STM with optimistic concurrency control. At runtime, we identify “hot” variables that cause large numbers of transactions to abort. For these variables we selectively switch to using pessimistic concurrency control, in the hope of deferring transactions until they will be able to run to completion. This trades off a reduction in single-threaded speed (since pessimistic concurrency control is not as streamlined as our optimistic implementation), against a reduced amount of wasted work in aborted transactions. We describe our implementation in the Haskell programming language, and examine its performance with a range of micro-benchmarks and larger programs. We show that our technique is effective at reducing the amount of wasted work, but that for current workloads there is often not enough wasted work for an overall improvement to be possible. As we demonstrate, our technique is not appropriate for some workloads: the extra work introduced by lock-induced deadlock is greater than the wasted work saved from aborted transactions. For other workloads, we show that using mutual exclusion locks for “hot” variables could be preferable to multi-reader locks because mutual exclusion avoids deadlocks caused by concurrent attempts to upgrade to write access.

rapid simulation and performance evaluation methods and tools | 2014

System-level power estimation tool for embedded processor based platforms

Santhosh Kumar Rethinagiri; Oscar Palomar; Rabie Ben Atitallah; Smail Niar; Osman S. Unsal; Adrián Cristal Kestelman

Due to the ever increasing constraints on power consumption in embedded systems, this paper addresses the need for an efficient power modeling and estimation methodology based tool at system-level. On the one hand, todays embedded industries focus more on manufacturing RISC processor-based platforms as they are cost and power effective. On the other hand, modern embedded applications are becoming more and more sophisticated and resource demanding: multimedia (H.264 encoder and decoder), software defined radio, GPS, mobile applications, etc. The main objective of this paper focuses on the scarcity of a fast power modeling and an accurate power estimation tool at the system-level for complex embedded systems. In this paper, we propose a standalone simulation tool for power estimation at system-level. As a first step, we develop the power models at the functional level. This is done by characterizing the power behavior of RISC processor based platforms across a wide spectrum of application benchmark to understand their power profile. Then, we propose power models to cost-effectively estimate its power at run-time of complex embedded applications. The proposed power models rely on a few parameters which are based on functional blocks of the processor architecture. As a second step, we propose a power estimation simulator which is based on cycle-accurate full system simulation framework. The combination of the above two steps provides a standalone power estimation tool at the system-level. The effectiveness of our proposed methodology is validated through an ARM9, an ARM Cortex-A8 and an ARM Cortex-A9 processor designed around the OMAP5912, OMAP 3530 and OMAP4430 boards respectively. The efficiency and the accuracy of our proposed tool is evaluated by using a variety of basic programs to complex benchmarks. Estimated power values are compared to real board measurements for the different processor architecture based platforms. Our obtained power estimation results provide less than 3% of error for ARM940T processor, 2.9% for ARM Cortex-A8 processor and 4.2% for ARM Cortex-A9 processor based platforms when compared to the other state-of-the-art power estimation tools.

computing frontiers | 2015

Programmer-directed partial redundancy for resilient HPC

Omer Subasi; Javier Arias; Osman S. Unsal; Jesús Labarta; Adrián Cristal

In this work we propose partial task replication and checkpointing for task-parallel HPC applications to mitigate silent data corruption (SDC) errors. As the complete replication of all application tasks can be prohibitive due to resource costs, we introduce programmer-directed selective replication mechanism to provide fault-tolerance while decreasing costs. Results show that our scheme detects and corrects around 65% of SDC errors with only 4% overhead on average.

european conference on parallel processing | 2015

Runtime-aware architectures

Marc Casas; Miquel Moreto; Lluc Alvarez; Emilio Castillo; Dimitrios Chasapis; Timothy Hayes; Luc Jaulmes; Oscar Palomar; Osman S. Unsal; Adrián Cristal; Eduard Ayguadé; Jesús Labarta; Mateo Valero

In the last few years, the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore’s Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instruction-level parallelism (ILP) in superscalar processors. Current multi-cores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multi-cores face. The runtime system of the parallel programming model has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. In the paper, we introduce an approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime’s perspective.

architectural support for programming languages and operating systems | 2010

Dynamic filtering: multi-purpose architecture support for language runtime systems

Tim Harris; Sasa Tomic; Adrián Cristal; Osman S. Unsal

This paper introduces a new abstraction to accelerate the read-barriers and write-barriers used by language runtime systems. We exploit the fact that, dynamically, many barrier executions perform checks but no real work -- e.g., in generational garbage collection (GC), frequent checks are needed to detect the creation of inter-generational references, even though such references occur rarely in many workloads. We introduce a form of dynamic filtering that identifies redundant checks by (i) recording checks that have recently been executed, and (ii) detecting when a barrier is repeating one of these checks. We show how this technique can be applied to a variety of algorithms for GC, transactional memory, and language-based security. By supporting dynamic filtering in the instruction set, we show that the fast-paths of these barriers can be streamlined, reducing the impact on the quality of surrounding code. We show how we accelerate the barriers used for generational GC and transactional memory in the Bartok research compiler. With a 2048-entry filter, dynamic filtering eliminates almost all the overhead of the GC write-barriers. Dynamic filtering eliminates around half the overhead of STM over a non-synchronized baseline -- even when used with an STM that is already designed for low overhead, and which employs static analyses to avoid redundant operations.

international conference on parallel architectures and compilation techniques | 2011

SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory

Gulay Yalcin; Osman S. Unsal; Adrian Cristal; Ibrahim Hur; Mateo Valero

Fault-tolerance has become an essential concern for processor designers due to increasing transient and permanent fault rates. In this study we propose Symptom TM, a symptom-based error detection technique that recovers from errors by leveraging the abort mechanism of Transactional Memory (TM). To the best of our knowledge, this is the first architectural fault-tolerance proposal using Hardware Transactional Memory (HTM). Symptom TM can recover from 86% and 65% of catastrophic failures caused by transient and permanent errors respectively with no performance overhead in error-free executions.

great lakes symposium on vlsi | 2011

Circuit design of a dual-versioning L1 data cache for optimistic concurrency

Azam Seyedi; Adrià Armejach; Adrián Cristal; Osman S. Unsal; Ibrahim Hur; Mateo Valero

This paper proposes a novel L1 data cache design with dual-versioning SRAM cells (dvSRAM) for chip multi-processors (CMP) that implement optimistic concurrency proposals. In this new cache architecture, each dvSRAM cell has two cells, a main cell and a secondary cell, which keep two versions of the same data. These values can be accessed, modified, moved back and forth between the main and secondary cells within the access time of the cache. We design and simulate a 32-KB dual-versioning L1 data cache with 45-nm CMOS technology at 2GHz processor frequency and 1V supply voltage, which we describe in detail. We also introduce three well-known use cases that make use of optimistic concurrency execution and that can benefit from our proposed design. Moreover, we evaluate one of the use cases to show the impact of the dual-versioning cell in both performance and energy consumption. Our experiments show that large speedups can be achieved with acceptable overall energy dissipation.

system on chip conference | 2014

DESSERT: DESign Space ExploRation Tool based on power and energy at System-Level

Santhosh Kumar Rethinagiri; Oscar Palomar; Adrián Cristal; Osman S. Unsal; Michael M. Swift

This paper proposes DESSERT (DESign Space ExploRation Tool at System-Level), a novel simulation-based tool for heterogeneous multi-core processor based platforms. This tool supports power/energy estimation, comprehensive architectural explorations and optimization of the given embedded applications for multi-core processor architectures. The development of DESSERT consists of three steps. First, we developed generic functional-level power models for different parts of the multi-core system to estimate power/energy, which are integrated into the system-level simulation environment. Second, we built a SystemC-based virtual platform prototype of the processor architecture to accurately extract the functional activities needed by the power model. Third, we designed a runtime task-dependencies management and optimization technique (work-load or dynamic slack reclamation) based on programming models that support both OpenMP and Pthread API for multi-core execution to consider both data-level and thread-level parallelism. The combination of above three steps leads to a novel Design Space Exploration (DSE) methodology. Power and energy estimates are validated against real board measurements. DESSERT power/energy estimation results provide less than 5% of error and offer reliable power/energy based DSE for the given applications.

Explore More