Ryan N. Rakvic
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryan N. Rakvic.
international symposium on computer architecture | 2006
Richard A. Hankins; Gautham N. Chinya; Jamison D. Collins; Perry H. Wang; Ryan N. Rakvic; Hong Wang; John Paul Shen
Microprocessor design is undergoing a major paradigm shift towards multi-core designs, in anticipation that future performance gains will come from exploiting threadlevel parallelism in the software. To support this trend, we present a novel processor architecture called the Multiple Instruction Stream Processing (MISP) architecture. MISP introduces the sequencer as a new category of architectural resource, and defines a canonical set of instructions to support user-level inter-sequencer signaling and asynchronous control transfer. MISP allows an application program to directly manage user-level threads without OS intervention. By supporting the classic cache-coherent shared-memory programming model, MISP does not require a radical shift in the multithreaded programming paradigm. This paper describes the design and evaluation of the MISP architecture for the IA-32 family of microprocessors. Using a research prototype MISP processor built on an IA-32-based multiprocessor system equipped with special firmware, we demonstrate the feasibility of implementing the MISP architecture. We then examine the utility of MISP by (1) assessing the key architectural tradeoffs of the MISP architecture design and (2) showing how legacy multithreaded applications can be migrated to MISP with relative ease.
international symposium on microarchitecture | 2004
Murali Annavaram; Ryan N. Rakvic; Marzia Polito; Jean-Yves Bouguet; Richard A. Hankins; Bob Davies
Recent studies have shown that most SPEC CPU2K benchmarks exhibit strong phase behavior, and the Cycles per Instruction (CPI) performance metric can be accurately predicted based on programs control-flow behavior, by simply observing the sequencing of the program counters, or extended instruction pointers (EIPs). One motivation of this paper is to see if server workloads also exhibit such phase behavior. In particular, can EIPs effectively predict CPI in server workloads? We propose using regression trees to measure the theoretical upper bound on the accuracy of predicting the CPI using EIPs, where accuracy is measure by the explained variance of CPI with EIPs. Our results show that for most server workloads and, surprisingly, even for CPU2K benchmarks, the accuracy of predicting CPI from EIPs varies widely. We classify the benchmarks into four quadrants based on their CPI variance and predictability of CPI using EIPs. Our results indicate that no single sampling technique can be broadly applied to a large class of applications. We propose a new methodology that selects the best-suited sampling technique to accurately capture the program behavior.
international symposium on microarchitecture | 2002
Youfeng Wu; Ryan N. Rakvic; Li-Ling Chen; Chyi-chang Miao; George Z. Chrysos; Jesse Fang
Advanced microprocessors have been increasing clock rates, well beyond the Gigahertz boundary. For such high performance microprocessors, a small and fast data micro-cache (ucache) is important to overall performance, and proper management of it via load bypassing has a significant performance impact. In this paper, we propose and evaluate a hardware-software collaborative technique to manage ucache bypassing for EPIC processors. The hardware supports the ucache bypassing with a fag in the load instruction format, and the compiler employs static analysis and profiling to identify loads that should bypass the ucache. The collaborative method achieves a significant improvement in performance for the SpecInt2000 benchmarks. On average, about 40%, 30%, 24%, and 22% of load references are identified to bypass 256 B, 1 K, 4 K, and 8 K sized ucaches, respectively. This reduces the ucache miss rates by 39%, 32%, 28%, and 26%. The number of pipeline stalls from loads to their uses is reduced by 13%, 9%, 6%, and 5%. Meanwhile, the L1 and L2 cache misses remain largely unchanged. For the 256 B ucache, bypassing improves overall performance on average by 5%.
Archive | 2011
Richard A. Hankins; Gautham N. Chinya; Hong Wang; Shivnandan D. Kaushik; Bryant Bigbee; John Paul Shen; Trung A. Diep; Xiang Zou; Baiju V. Patel; Paul M. Petersen; Sanjiv Shah; Ryan N. Rakvic; Prashant Sethi
Archive | 2005
Ryan N. Rakvic; Richard A. Hankins; Ed Grochowski; Hong Wang; Murali Annavaram; David K. Poulsen; Sanjiv Shah; John Paul Shen; Gautham N. Chinya
Archive | 2005
Ryan N. Rakvic; Richard A. Hankins; Hong Wang; Trung A. Diep; Xinmin Tain; Paul M. Petersen; Sanjiv Shah; John Paul Shen; Gautham N. Chinya; Shivnandan D. Kaushik; Bryant Bigbee; Baiju V. Patel; Douglas R. Armstrong
Archive | 2005
Shih-Wei Liao; Ryan N. Rakvic; Richard A. Hankins; Hong Wang; Gansha Wu; Guei-Yuan Lueh; Xinmin Tian; Paul M. Petersen; Sanjiv Shah; Trung A. Diep; John Paul Shen; Gautham N. Chinya
Archive | 2005
Hong Wang; Gautham N. Chinya; Richard A. Hankins; Shivnandan D. Kaushik; Bryant Bigbee; John Paul Shen; Per Hammarlund; Xiang Zou; Jason W. Brandt; Prashant Sethi; Douglas M. Carmean; Baiju V. Patel; Scott Dion Rodgers; Ryan N. Rakvic; John L. Reid; David K. Poulsen; Sanjiv Shah; James P. Held; James C. Abel
Archive | 2007
Antonio González; Qiong Cai; Jose Gonzalez; Pedro Chaparro; Grigorios Magklis; Ryan N. Rakvic
Archive | 2002
Ryan N. Rakvic; Christopher B. Wilkerson; Bryan Black; Edward T. Grochowski; John Paul Shen; Edward A. Brekelbaum