Is this you? Create Your Porfile

Gulay Yalcin

Barcelona Supercomputing Center

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gulay Yalcin is active.

Explore More

Publication

Featured researches published by Gulay Yalcin.

international conference on computer design | 2012

Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime

Yu Cai; Gulay Yalcin; Onur Mutlu; Erich F. Haratsch; Adrián Cristal; Osman S. Unsal; Ken Mai

With the continued scaling of NAND flash and multi-level cell technology, flash-based storage has gained widespread use in systems ranging from mobile platforms to enterprise servers. However, the robustness of NAND flash cells is an increasing concern, especially at nanometer-regime process geometries. NAND flash memory bit error rate increases exponentially with the number of program/erase cycles. Stronger error correcting codes (ECC) can be used to tolerate higher error rates, but these have diminishing returns with increasing P/E cycles and can have prohibitively high power, area, and latency overheads. The goal of this paper is to develop new techniques that can tolerate high bit error rates without requiring prohibitively strong ECC. Our techniques, called Flash Correct-and-Refresh (FCR) exploit the observation that the dominant error source in NAND flash memory is retention errors, caused by flash cells losing charge over time. The key idea is to periodically read, correct, and reprogram (in-place) or remap the stored data before it accumulates more retention errors than can be corrected by simple ECC. Detailed simulations of a solid-state drive (SSD) storage system driven by measured experimental data from error characterization on real flash memory chips show that our techniques provide 46× average lifetime improvement on a variety of workloads at no additional hardware cost. We also find that our techniques achieve lifetime improvements that cannot feasibly be achieved with stronger ECC.

measurement and modeling of computer systems | 2014

Neighbor-cell assisted error correction for MLC NAND flash memories

Yu Cai; Gulay Yalcin; Onur Mutlu; Erich F. Haratsch; Osman S. Unsal; Adrián Cristal; Ken Mai

Continued scaling of NAND flash memory to smaller process technology nodes decreases its reliability, necessitating more sophisticated mechanisms to correctly read stored data values. To distinguish between different potential stored values, conventional techniques to read data from flash memory employ a single set of reference voltage values, which are determined based on the overall threshold voltage distribution of flash cells. Unfortunately, the phenomenon of program interference, in which a cells threshold voltage unintentionally changes when a neighboring cell is programmed, makes this conventional approach increasingly inaccurate in determining the values of cells. This paper makes the new empirical observation that identifying the value stored in the immediate-neighbor cell makes it easier to determine the data value stored in the cell that is being read. We provide a detailed statistical and experimental characterization of threshold voltage distribution of flash memory cells conditional upon the immediate-neighbor cell values, and show that such conditional distributions can be used to determine a set of read reference voltages that lead to error rates much lower than when a single set of reference voltage values based on the overall distribution are used. Based on our analyses, we propose a new method for correcting errors in a flash memory page, neighbor-cell assisted correction (NAC). The key idea is to re-read a flash memory page that fails error correction codes (ECC) with the set of read reference voltage values corresponding to the conditional threshold voltage distribution assuming a neighbor cell value and use the re-read values to correct the cells that have neighbors with that value. Our simulations show that NAC effectively improves flash memory lifetime by 33% while having no (at nominal lifetime) or very modest (less than 5% at extended lifetime) performance overhead.

international conference on computer design | 2011

FIMSIM: A fault injection infrastructure for microarchitectural simulators

Gulay Yalcin; Osman S. Unsal; Adrián Cristal; Mateo Valero

Fault injection is a widely used approach for experiment-based dependability evaluation. Injecting faults to microarchitectural simulators is particularly appealing for researchers, since it can be utilized at the early design stage of the processor. As such, it enables a preliminary analysis of the correlation between the criticality of processor-structure level faults and their impact on applications. In this study, we present FIMSIM, a compact fault injection infrastructure for microarchitectural simulators which is capable of injecting transient, permanent, intermittent and multi-bit faults. FIMSIM provides the opportunity to comprehensively evaluate the vulnerability of different microarchitectural structures against different fault models.

parallel, distributed and network-based processing | 2014

Combining Error Detection and Transactional Memory for Energy-Efficient Computing below Safe Operation Margins

Gulay Yalcin; Adrian Cristal; Osman S. Unsal; Anita Sobe; Derin Harmanci; Pascal Felber; Alexey Voronin; Jons-Tobias Wamhoff; Christof Fetzer

The power envelope has become a major issue for the design of computer systems. One way of reducing energy consumption is to downscale the voltage of microprocessors. However, this does not come without costs. By decreasing the voltage, the likelihood of failures increases drastically and without mechanisms for reliability, the systems would not operate any more. For reliability we need (1) error detection and (2) error recovery mechanisms. We provide in this paper a first study investigating the combination of different error detection mechanisms with transactional memory, with the objective to improve energy efficiency. According to our evaluation, using reliability schemes combined with transactional memory for error recovery reduces energy by 54% while providing a reliability level of 100%.

Microprocessors and Microsystems | 2015

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy for data centers

Santhosh Kumar Rethinagiri; Oscar Palomar; Anita Sobe; Gulay Yalcin; Thomas Knauth; J. Rubén Titos Gil; Pablo Prieto; Adrian Cristal; Osman Unsal; Pascal Felber; Christof Fetzer; Dragomir Milojevic

Abstract Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On the one hand, designers use technology scaling as one of the methods to face the phenomenon called dark silicon (only segments of a chip function concurrently due to power restrictions). On the other hand, designers use extreme-scale systems such as teradevices to meet the performance needs of their applications which in turn increases the power consumption of the platform. In order to overcome these challenges, we need novel computing paradigms that address energy efficiency. One of the promising solutions is to incorporate parallel distributed methodologies at different abstraction levels. The FP7 project ParaDIME focuses on this objective to provide different distributed methodologies (software–hardware techniques) at different abstraction levels to attack the power-wall problem. In particular, the ParaDIME framework will utilize: circuit and architecture operation below safe voltage limits for drastic energy savings, specialized energy-aware computing accelerators, heterogeneous computing, energy-aware runtime, approximate computing and power-aware message passing. The major outcome of the project will be a noval processor architecture for a heterogeneous distributed system that utilizes future device characteristics, runtime and programming model for drastic energy savings of data centers. Wherever possible, ParaDIME will adopt multidisciplinary techniques, such as hardware support for message passing, runtime energy optimization utilizing new hardware energy performance counters, use of accelerators for error recovery from sub-safe voltage operation, and approximate computing through annotated code. Furthermore, we will establish and investigate the theoretical limits of energy savings at the device, circuit, architecture, runtime and programming model levels of the computing stack, as well as quantify the actual energy savings achieved by the ParaDIME approach for the complete computing stack with the real environment.

international conference on cluster computing | 2016

A Runtime Heuristic to Selectively Replicate Tasks for Application-Specific Reliability Targets

Omer Subasi; Gulay Yalcin; Ferad Zyulkyarov; Osman S. Unsal; Jesús Labarta

In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.

international on-line testing symposium | 2014

Exploiting a fast and simple ECC for scaling supply voltage in level-1 caches

Gulay Yalcin; Emrah Islek; Oyku Tozlu; Pedro Reviriego; Adrian Cristal; Osman S. Unsal; Oguz Ergin

Scaling supply voltage to near-threshold is a very effective approach in reducing the energy consumption of computer systems. However, executing below the safe operation margin of supply voltage introduces high number of persistent failures, especially in memory structures. Thus, it is essential to provide reliability schemes to tolerate these persistent failures in the memory structures. In this study, we adopt a Single Error Correction Multiple Adjacent Error Correction (SEC-MAEC) code in order to minimize the energy consumption of L1 caches. In our evaluations, we present that the SEC-MAEC code is a fast and energy efficient Error Correcting Code (ECC). It presents 10X less area overhead and 2X less latency for the decoder compared to Orthogonal Latin Square Code, the state-of-the art ECC utilized in the L1 cache under the scaling supply voltage.

embedded systems for real time multimedia | 2014

System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms

Santhosh Kumar Rethinagiri; Oscar Palomar; Javier Arias Moreno; Gulay Yalcin; Osman S. Unsal; Adrián Cristal

Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level design is considered as a vital premise to explore design trade-offs for such devices in the early stage of the design flow. This paper proposes a novel system-level power/energy estimation methodology and optimization techniques for heterogeneous CPU-GPU based platforms. There are two parts involved in this methodology. First, we developed the power models by using functional parameters to set up generic power models for different parts of the platform. Second, we designed a simulation based system-level prototype using SystemC (JIT) and Cycle-Accurate simulators to accurately evaluate the activities used in the related power models. The combination of the two parts leads to a novel power estimation methodology at system-level, which gives a good trade-off between accuracy and speed. Moreover, leveraging our methodology, we introduce novel power optimization techniques such as inter-task DVFS and workload balancing at the system-level for CPU-GPU platforms. The efficiency of our proposed methodology and optimization techniques are validated through a CARMA kit, which consists of an ARM quad-core processor and a NVIDIA GPU processor (96 cores). Estimated power and energy values are compared to real board measurements. Our obtained power/energy estimation results provide less than 2.5% of error for single core processor, 4% for dual-core processor, 4% for quad-core, 4% for GPU and 6% multi-processor based systems. By using the proposed optimization techniques, we achieved significant power and energy savings of up to 45% and 70% respectively for various industrial benchmarks.

digital systems design | 2014

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy

Santhosh Kumar Rethinagiri; Oscar Palomar; Anita Sobe; Thomas Knauth; Wojciech M. Barczynski; Gulay Yalcin; Yarco Hayduk; Adrian Cristal; Osman Unsal; Pascal Felber; Christof Fetzer; Julien Ryckaert; Gina Alioto

Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On one hand, designers use technology scaling as one of the methods to face the phenomenon called dark silicon (only segments of a chip function concurrently due to power restrictions). On the other hand, designers use extreme-scale systems such as teradevices to meet the performance needs of their applications which in turn increases the power consumption of the platform. In order to overcome these challenges, we need novel computing paradigms that address energy efficiency. One of the promising solutions is to incorporate parallel distributed methodologies at different abstraction levels. The FP7 project ParaDIME focuses on this objective to provide different distributed methodologies (software-hardware techniques) at different abstraction levels to attack the power-wall problem. In particular, the ParaDIME framework will utilize: circuit and architecture operation below safe voltage limits for drastic energy savings, specialized energy-aware computing accelerators, heterogeneous computing, energy-aware runtime, approximate computing and power-aware message passing. The major outcome of the project will be a processor architecture for a heterogeneous distributed system that utilizes future device characteristics for drastic energy savings. Wherever possible, ParaDIME will adopt multidisciplinary techniques, such as hardware support for message passing, runtime energy optimization utilizing new hardware energy performance counters, use of accelerators for error recovery from sub-safe voltage operation, and approximate computing through annotated code. Furthermore, we will establish and investigate the theoretical limits of energy savings at the device, circuit, architecture, runtime and programming model levels of the computing stack, as well as quantify the actual energy savings achieved by the ParaDIME approach for the complete computing stack with the real environment.

Microprocessors and Microsystems | 2014

Bit Impact Factor: Towards making fair vulnerability comparison

Serdar Zafer Can; Gulay Yalcin; Oguz Ergin; Osman S. Unsal; Adrian Cristal

Reliability is becoming a major design concern in contemporary microprocessors since soft error rate is increasing due to technology scaling. Therefore, design time system vulnerability estimation is of paramount importance. Architectural Vulnerability Factor (AVF) is an early vulnerability estimation methodology. However, AVF considers that the value of a bit in a clock cycle is either required for Architecturally Correct Execution (i.e. ACE-bit) or not (i.e. unACE-bit); therefore, AVF cannot distinguish the vulnerability impact level of an ACE-bit. In this study, we present a new dimension which takes into account the vulnerability impact level of a bit. We introduce Bit Impact Factor metric which, we believe, will be helpful for extending AVF evaluation to provide a more accurate vulnerability analysis.

Explore More