Ulya R. Karpuzcu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ulya R. Karpuzcu is active.

Explore More

Publication

Featured researches published by Ulya R. Karpuzcu.

high-performance computer architecture | 2009

Accurate microarchitecture-level fault modeling for studying hardware faults

Man Lap Li; Ulya R. Karpuzcu; Siva Kumar Sastry Hari; Sarita V. Adve

Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moores Law. There is much ongoing research on efficient fault tolerance mechanisms across all levels of the system stack, from the device level to the system level. High-level fault tolerance solutions, such as at the microarchitecture and system levels, are commonly evaluated using statistical fault injections with microarchitecture-level fault models. Since hardware faults actually manifest at a much lower level, it is unclear if such high level fault models are acceptably accurate. On the other hand, lower level models, such as at the gate level, may be more accurate, but their increased simulation times make it hard to track the system-level propagation of faults. Thus, an evaluation of high-level reliability solutions entails the classical tradeoff between speed and accuracy. This paper seeks to quantify and alleviate this tradeoff. We make the following contributions: (1) We introduce SWAT-Sim, a novel fault injection infrastructure that uses hierarchical simulation to study the system-level manifestations of permanent (and transient) gate-level faults. For our experiments, SWAT-Sim incurs a small average performance overhead of under 3x, for the components we simulate, when compared to pure microarchitectural simulations. (2) We study system-level manifestations of faults injected under different microarchitecture-level and gate-level fault models and identify the reasons for the inability of microarchitecture-level faults to model gate-level faults in general. (3) Based on our analysis, we derive two probabilistic microarchitecture-level fault models to mimic gate-level stuck-at and delay faults. Our results show that these models are, in general, inaccurate as they do not capture the complex manifestation of gate-level faults. The inaccuracies in existing models and the lack of more accurate microarchitecture-level models motivate using infrastructures similar to SWAT-Sim to faithfully model the microarchitecture-level effects of gate-level faults.

high-performance computer architecture | 2009

Blueshift: Designing processors for timing speculation from the ground up.

Brian Greskamp; Lu Wan; Ulya R. Karpuzcu; Jeffrey J. Cook; Josep Torrellas; Deming Chen; Craig B. Zilles

Several recent processor designs have proposed to enhance performance by increasing the clock frequency to the point where timing faults occur, and by adding error-correcting support to guarantee correctness. However, such Timing Speculation (TS) proposals are limited in that they assume traditional design methodologies that are suboptimal under TS. In this paper, we present a new approach where the processor itself is designed from the ground up for TS. The idea is to identify and optimize the most frequently-exercised critical paths in the design, at the expense of the majority of the static critical paths, which are allowed to suffer timing errors. Our approach and design optimization algorithm are called BlueShift. We also introduce two techniques that, when applied under BlueShift, improve processor performance: On-demand Selective Biasing (OSB) and Path Constraint Tuning (PCT). Our evaluation with modules from the OpenSPARC T1 processor shows that, compared to conventional TS, BlueShift with OSB speeds up applications by an average of 8% while increasing the processor power by an average of 12%. Moreover, compared to a high-performance TS design, BlueShift with PCT speeds up applications by an average of 6% with an average processor power overhead of 23% . providing a way to speed up logic modules that is orthogonal to voltage scaling.

international symposium on microarchitecture | 2009

The BubbleWrap many-core: popping cores for sequential acceleration

Ulya R. Karpuzcu; Brian Greskamp; Josep Torrellas

Many-core scaling now faces a power wall. The gap between the number of cores that fit on a die and the number that can operate simultaneously under the power budget is rapidly increasing with technology scaling. In future designs, many of the cores may have to be dormant at any given time to meet the power budget. To push back the many-core power wall, this paper proposes Dynamic Voltage Scaling for Aging Management (DVSAM) - a new scheme for managing processor aging to attain higher performance or lower power consumption. In addition, this paper introduces the BubbleWrap many-core, a novel architecture that makes extensive use of DVSAM. BubbleWrap identifies the most power-efficient set of cores in a variation-affected chip-the largest set that can be simultaneously powered-on - and designates them as Throughput cores dedicated to parallel-section execution. The rest of the cores are designated as Expendable and are dedicated to accelerating sequential sections. BubbleWrap attains maximum sequential acceleration by sacrificing Expendable cores one at a time, running them at elevated supply voltage for a significantly shorter service life each, until they completely wear-out and are discarded - figuratively, as if popping bubbles in bubble wrap that protects Throughput cores. In simulated 32-core chips, BubbleWrap provides substantial improvements over a plain chip. For example, on average, one design runs fully-sequential applications at a 16% higher frequency, and fully-parallel ones with a 30% higher throughput.

dependable systems and networks | 2012

VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages

Ulya R. Karpuzcu; Krishna Kolluru; Nam Sung Kim; Josep Torrellas

Near-Threshold Computing (NTC), where the supply voltage is only slightly higher than the threshold voltage of transistors, is a promising approach to attain energy-efficient computing. Unfortunately, compared to the conventional Super-Threshold Computing (STC), NTC is more sensitive to process variations, which results in higher power consumption and lower frequencies than would otherwise be possible, and potentially a non-negligible fault rate. To help address variations at NTC at the architecture level, this paper presents the first microarchitectural model of process variations for NTC. The model, called VARIUS-NTV, extends the existing VARIUS variation model. Its key aspects include: (i) adopting a gate-delay model and an SRAM cell type that are tailored to NTC, (ii) modeling SRAM failure modes emerging at NTC, and (iii) accounting for the impact of leakage in SRAM models. We evaluate a simulated 11nm, 288-core tiled manycore at both NTC and STC. The results show higher frequency and power variations within the NTC chip. For example, the maximum difference in on-chip tile frequency is ≈2.3× at STC and ≈3.7× at NTC. We also validate our model against an experimental chip.

high-performance computer architecture | 2013

EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing

Ulya R. Karpuzcu; Abhishek A. Sinkar; Nam Sung Kim; Josep Torrellas

While Near-Threshold Voltage Computing (NTC) is a promising approach to push back the manycore power wall, it suffers from a high sensitivity to parameter variations. One possible way to cope with variations is to use multiple on-chip voltage (Vdd) domains. However, this paper finds that such an approach is energy inefficient. Consequently, for NTC, we propose a manycore organization that has a single Vdd domain and relies on multiple frequency domains to tackle variation. We call it EnergySmart. For this approach to be competitive, it has to be paired with effective core assignment strategies and also support fine-grain (i.e., short-interval) DVFS. This paper shows that, at NTC, a simple chip with a single Vdd domain can deliver a higher performance per watt than one with multiple Vdd domains.

IEEE Micro | 2013

Coping with Parametric Variation at Near-Threshold Voltages

Ulya R. Karpuzcu; Nam Sung Kim; Josep Torrellas

Near-threshold voltage computing (NTC) promises significant improvement in energy efficiency. Unfortunately, when compared to conventional, super-threshold voltage computing (STC), NTC is more sensitive to parametric variation. This results in not only slower and leakier cores, but also substantial speed and power differences between the cores in a many-core chip. NTCs potential cannot be unlocked without addressing the higher impact of variation. To confront variation at the architecture level, the authors introduce a parametric variation model for NTC. They then use the model to show the shortcomings of adapting state-of-the-art STC techniques for variation mitigation to NTC. Finally, they discuss how to tailor variation mitigation to NTC.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors

Abhishek A. Sinkar; Hamid Reza Ghasemi; Michael J. Schulte; Ulya R. Karpuzcu; Nam Sung Kim

Per-core voltage domains can improve performance under a power constraint. Most commercial processors, however, only have a single voltage domain for all processor cores. This is because splitting the single voltage domain into per-core voltage domains and powering them with multiple off-chip voltage regulators (VRs) incur a high cost for the platform and package designs. Although using on-chip switching VRs can be an alternative solution, integrating high-quality inductors for VRs with cores has been a technical challenge. In this paper, we propose a cost-effective power delivery technique to support per-core voltage domains. Our technique is based on the observations that: 1) core-to-core (C2C) voltage variations are relatively small for most execution intervals when the voltages/frequencies are optimized to maximize performance under a power constraint and 2) per-core power-gating devices augmented with feedback control circuitry can serve as low-cost VRs that can provide high efficiency in situations like 1). Our experimental results show that processors using our technique can achieve power efficiency as high as those using the per-core on-chip switching VRs at a much lower cost.

genetic and evolutionary computation conference | 2005

Automatic verilog code generation through grammatical evolution

Ulya R. Karpuzcu

This work aims to investigate the automatic generation of Verilog code, representing digital circuits through Grammatical Evolution (GE). Preliminary tests using a simple full adder generation problem have been performed.

high-performance computer architecture | 2014

Accordion: Toward soft Near-Threshold Voltage Computing

Ulya R. Karpuzcu; Ismail Akturk; Nam Sung Kim

While more cores can find place in the unit chip area every technology generation, excessive growth in power density prevents simultaneous utilization of all. Due to the lower operating voltage, Near-Threshold Voltage Computing (NTC) promises to fit more cores in a given power envelope. Yet NTC prospects for energy efficiency disappear without mitigating (i) the performance degradation due to the lower operating frequency; (ii) the intensified vulnerability to parametric variation. To compensate for the first barrier, we need to raise the degree of parallelism - the number of cores engaged in computation. NTC-prompted power savings dominate the power cost of increasing the core count. Hence, limited parallelism in the application domain constitutes the critical barrier to engaging more cores in computation. To avoid the second barrier, the system should tolerate variation-induced errors. Unfortunately, engaging more cores in computation exacerbates vulnerability to variation further. To overcome NTC barriers, we introduce Accordion, a novel, light-weight framework, which exploits weak scaling along with inherent fault tolerance of emerging R(ecognition), M(ining), S(ynthesis) applications. The key observation is that the problem size not only dictates the number of cores engaged in computation, but also the application output quality. Consequently, Accordion designates the problem size as the main knob to trade off the degree of parallelism (i.e. the number of cores engaged in computation), with the degree of vulnerability to variation (i.e. the corruption in application output quality due to variation-induced errors). Parametric variation renders ample reliability differences between the cores. Since RMS applications can tolerate faults emanating from data-intensive program phases as opposed to control, variation-afflicted Accordion hardware executes fault-tolerant data-intensive phases on error-prone cores, and reserves reliable cores for control.

IEEE Transactions on Very Large Scale Integration Systems | 2016

System-Level Power Analysis of a Multicore Multipower Domain Processor With ON-Chip Voltage Regulators

Ayan Paul; Sang Phill Park; Dinesh Somasekhar; Young Moon Kim; Nitin Borkar; Ulya R. Karpuzcu; Chris H. Kim

In this paper, we study two different ON-chip power delivery schemes, namely, fully integrated voltage regulator (FIVR) and low-dropout regulator (LDO), and analyze their effect on total system power under process variation, assuming a realistic dynamic voltage-frequency scaling (DVFS) system. The impact of different task scheduling algorithms on the overall system power was also analyzed. We find that in a hypothetical 256-core processor, under a per-core DVFS assumption, the FIVR-based power delivery consumes 20% less power than the LDO-based one for a 50% throughput. However, as the number of cores in the processor reduces, the difference in power consumption between the FIVR-based and LDO-based power delivery schemes becomes smaller. For example, in the case of a 16-core processor with per-core DVFS capability, FIVR-based design was found to consume about the same power as the LDO-based design.

Explore More