Resit Sendag | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Resit Sendag is active.

Explore More

Publication

Featured researches published by Resit Sendag.

high-performance computer architecture | 2005

Characterizing and comparing prevailing simulation techniques

Joshua J. Yi; Sreekumar V. Kodakara; Resit Sendag; David J. Lilja; Douglas M. Hawkins

Due to the simulation time of the reference input set, architects often use alternative simulation techniques. Although these alternatives reduce the simulation time, what has not been evaluated is their accuracy relative to the reference input set, and with respect to each other. To rectify this deficiency, this paper uses three methods to characterize the reduced input set, truncated execution, and sampling simulation techniques while also examining their speed versus accuracy trade-off and configuration dependence. Finally, to illustrate the effect that a technique could have on the apparent speedup results, we quantify the speedups obtained with two processor enhancements. The results show that: 1) the accuracy of the truncated execution techniques was poor for all three characterization methods and for both enhancements, 2) the characteristics of the reduced input sets are not reference-like, and 3) SimPoint and SMARTS, the two sampling techniques, are extremely accurate and have the best speed versus accuracy trade-offs. Finally, this paper presents a decision tree which can help architects choose the most appropriate technique for their simulations.

ieee international symposium on workload characterization | 2006

Evaluating Benchmark Subsetting Approaches

Joshua J. Yi; Resit Sendag; Lieven Eeckhout; Ajay Joshi; David J. Lilja; Lizy Kurian John

To reduce the simulation time to a tractable amount or due to compilation (or other related) problems, computer architects often simulate only a subset of the benchmarks in a benchmark suite. However, if the architect chooses a subset of benchmarks that is not representative, the subsequent simulation results will, at best, be misleading or, at worst, yield incorrect conclusions. To address this problem, computer architects have recently proposed several statistically-based approaches to subset a benchmark suite. While some of these approaches are well-grounded statistically, what has not yet been thoroughly evaluated is the: 1) absolute accuracy; 2) relative accuracy across a range of processor and memory subsystem enhancements; and 3) representativeness and coverage of each approach for a range of subset sizes. Specifically, this paper evaluates statistically-based subsetting approaches based on principal components analysis (PCA) and the Plackett and Burman (P&B) design, in addition to prevailing approaches such as integer vs. floating-point, core vs. memory-bound, by language, and at random. Our results show that the two statistically-based approaches, PCA and P&B, have the best absolute and relative accuracy for CPI and energy-delay product (EDP), produce subsets that are the most representative, and choose benchmark and input set pairs that are most well-distributed across the benchmark space. To achieve a 5% absolute CPI and EDP error, across a wide range of configurations, PCA and P&B typically need about 17 benchmark and input set pairs, while the other five approaches often choose more than 30 benchmark and input set pairs

international parallel and distributed processing symposium | 2003

Using incorrect speculation to prefetch data in a concurrent multithreaded processor

Ying Chen; Resit Sendag; D.J. Lija

Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. The resulting speculative issuing of load instructions in these architectures can significantly impact the performance of the memory hierarchy as the system exploits higher degrees of parallelism. In this study, we investigate the effects of executing the mispredicted load instructions on the cache performance of a scalable multithreaded architecture. We show that the execution of loads from the wrongly-predicted branch path within a thread, or from a wrongly forked thread, can result in an indirect prefetching effect for later correctly-executed paths. By continuing to execute the mispredicted load instructions even after the instruction- or thread-level control speculation is known to be incorrect, the cache misses for the correctly predicted paths and threads can be reduced, typically by 42-73%. We introduce the small, fully-associative Wrong Execution Cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5% on the benchmark programs tested, with an average improvement of 9.7%, due to the reductions in the number of cache misses.

european conference on parallel processing | 2002

Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions

Resit Sendag; David J. Lilja; Steven R. Kunkel

As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting conditional branches, to exploit more instruction-level parallelism by reducing the impact of memory delays. We examine the effects of the execution of loads down the wrong branch path on the performance of an aggressive issue processor. We find that, by continuing to execute the loads issued in the mispredicted path, even after the branch is resolved, we can actually reduce the cache misses observed on the correctly executed path. This wrong-path execution of loads can result in a speedup of up to 5% due to an indirect prefetching effect that brings data or instruction blocks into the cache for instructions subsequently issued on the correctly predicted path. However, it also can increase the amount of memory traffic and can pollute the cache. We propose the Wrong Path Cache (WPC) to eliminate the cache pollution caused by the execution of loads down mispredicted branch paths. For the configurations tested, fetching the results of wrong path loads into a fully associative 8-entry WPC can result in a 12% to 39% reduction in L1 data cache misses and in a speedup of up to 37%, with an average speedup of 9%, over the baseline processor.

european conference on parallel processing | 2002

Increasing Instruction-Level Parallelism with Instruction Precomputation

Joshua J. Yi; Resit Sendag; David J. Lilja

Value reuse improves a processor’s performance by dynamically caching the results of previous instructions and reusing those results to bypass the execution of future instructions that have the same opcode and input operands. However, continually replacing the least recently used entries could eventually fill the value reuse table with instructions that are not frequently executed. Furthermore, the complex hardware that replaces entries and updates the table may necessitate an increase in the clock period. We propose instruction precomputation to address these issues by profiling programs to determine the opcodes and input operands that have the highest frequencies of execution. These instructions then are loaded into the precomputation table before the program executes. During program execution, the precomputation table is used in the same way as the value reuse table is, with the exception that the precomputation table does not dynamically replace any entries. For a 2K-entry precomputation table implemented on a 4-way issue machine, this approach produced an average speedup of 11.0%. By comparison, a 2K-entry value reuse table produced an average speedup of 6.7%. Instruction precomputation outperforms value reuse, especially for smaller tables, with the same number of table entries while using less area and having a lower access time.

IEEE Micro | 2007

Reliability: Fallacy or Reality?

Antonio González; Scott A. Mahlke; Shubu Mukherjee; Resit Sendag; Derek Chiou; Joshua J. Yi

As chip architects and manufacturers plumb ever-smaller process technologies, new species of faults are compromising device reliability, following an introduction by the authors debate whether reliability is a legitimate concern for the microarchitect. topics include the costs of adding reliability versus those of ignoring it, how to measure it, techniques for improving it, and whether consumers really want it.

IEEE Computer Architecture Letters | 2003

Address Correlation: Exceeding the Limits of Locality

Resit Sendag; Peng Fei Chuang; David J. Lilja

We investigate a program phenomenon, Address Correlation, which links addresses that reference the same data.This work shows that different addresses containing the samedata can often be correlated at run-time to eliminate a load missor a partial hit. For ten of the SPEC CPU2000 benchmarks, 57 to99% of all L1 data cache load misses, and 4 to 85% of all partialhits, can be supplied from a correlated address already found inthe cache. Our source code-level analysis shows that semanticallyequivalent information, duplicated references, and frequentvalues are the major causes of address correlations. We also showthat, on average, 68% of the potential correlated addresses thatcould supply data on a miss of an address containing the samevalue can be correlated at run time. These correlated addressescorrespond to an average of 62% of all misses in the benchmarkprograms tested.

international parallel and distributed processing symposium | 2006

Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems

Resit Sendag; Ayse Yilmazer; Joshua J. Yi; Augustus K. Uht

High-performance multiprocessor systems built around out-of-order processors with aggressive branch predictors execute many memory references that turn out to be on a mispredicted branch path. Previous work that focused on uniprocessors showed that these wrong-path memory references may pollute the caches by bringing in data that are not needed on the correct execution path and by evicting useful data or instructions. Additionally, they may also increase the amount of cache and memory traffic. On the positive side, however, they may have a prefetching effect for memory references on the correct path. While computer architects have thoroughly studied the impact of wrong-path effects in uniprocessor systems, there is no previous work on its effects in multiprocessor systems. In this paper, we explore the effects of wrong-path memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these wrong-path memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directory-based SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, wrong-path memory references also increase the number of cache line state transitions by 21% and 32 % for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these wrong-path memory references, we introduce two simple mechanisms - filtering wrong-path blocks that are not likely-to-be-used and wrong-path aware cache replacement - that yield speedups of up to 37%.

international symposium on microarchitecture | 2010

Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?

Arvind; David I. August; Keshav Pingali; Derek Chiou; Resit Sendag; Joshua J. Yi

In this panel discussion from the 2009 Workshop on Computer Architecture Research Directions, David August and Keshav Pingali debate whether explicitly parallel programming is a necessary evil for applications programmers, assess the current state of parallel programming models, and discuss possible routes toward finding the programming model for the multicore era.

international symposium on microarchitecture | 2007

Where Does Security Stand? New Vulnerabilities vs. Trusted Computing

S. Gueron; G. Stronqin; J.-P. Seifert; Derek Chiou; Resit Sendag; Joshua J. Yi

How can we ensure that platform hardware, firmware, and software work in concert to withstand rapidly evolving security threats? Architectural innovations bring performance gains but can also create new security vulnerabilities. In this panel discussion, from the 2007 workshop on Computer Architecture Research directions, we assess the current state of security and discuss possible routes toward trusted computing.

Explore More