Roni Rosner
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roni Rosner.
symposium on code generation and optimization | 2004
Yoav Almog; Roni Rosner; Naftali Schwartz; Ari Schmorak
We study several major characteristics of dynamic optimization within the PARROT power-aware, trace-cache-based microarchitectural framework. We investigate the benefit of providing optimizations which although tightly coupled with the microarchitecture in substance are decoupled in time. The tight coupling in substance provides the potential for tailoring optimizations for microarchitecture in a manner impossible or impractical not only for traditional static compilers but even for a JIT. We show that the contribution of common, generic optimizations to processor performance and energy efficiency may be more than doubled by creating a more intimate correlation between hardware specifics and the optimizer. In particular, dynamic optimizations can profit greatly from hardware supporting fused and SIMDified operations. At the same time, the decoupling in time allows optimizations to be arbitrarily aggressive without significant performance loss. We demonstrate that requiring up to 512 repetitions before a trace is optimized sacrifices almost no performance or efficiency as compared with lower thresholds. These results confirm the feasibility of energy efficient hardware implementation of an aggressive optimizer.
international symposium on computer architecture | 2004
Roni Rosner; Yoav Almog; Micha Moffie; Naftali Schwartz; Avi Mendelson
We present the PARROT concept that seeks to achieve higher performance with reduced energy consumption through gradual optimization of frequently executed code traces. The PARROT microarchitectural framework integrates trace caching, dynamic optimizations and pipeline decoupling. We employ a selective approach for applying complex mechanisms only upon the most frequently used traces to maximize the performance gain at any given power constraint, thus attaining finer control of tradeoffs between performance and power awareness. We show that the PARROT based microarchitecture can improve the performance of aggressively designed processors by providing the means to improve the utilization of their more elaborate resources. At the same time, rigorous selection of traces prior to storage and optimization provides the key to attenuating increases in the power budget. For resource-constrained designs, PARROT based architectures deliver better performance (up to an average 16% increase in IPC) at a comparable energy level, whereas the conventional path to a similar performance improvement consumes an average 70% more energy. Meanwhile, for those designs which can tolerate a higher power budget, PARROT gracefully scales up to use additional execution resources in a uniformly efficient manner. In particular, a PARROT-style doubly-wide machine delivers an average 45% IPC improvement while actually improving the cubic-MIPS-per-WATT power awareness metric by over 50%.
international conference on parallel architectures and compilation techniques | 2001
Roni Rosner; Avi Mendelson; Ronny Ronen
The trace cache is becoming an important building block of modern, wide-issue, processors. The paper has three main contributions: it indicates that trace cache optimizations directed to reducing power consumption are do not necessarily coincide with optimisations directed to increasing fetch bandwidth; it extends our understanding on how well the trace cache utilizes its resources and introduces a new trace-cache organization based on filtering techniques. We observe that: (1) the majority of traces that are inserted into the trace-cache are rarely used again before being replaced; (2) the majority of the instructions delivered for execution originate from the fewer traces that are heavily and repeatedly used; and that (3) techniques that aim to improve instruction fetch bandwidth may increase the number of traces built during program execution. Based on these observations, we propose splitting the trace cache into two components: the filter trace-cache (FTC) and the main trace-cache (MTC). The FTC/MTC organization exhibits an important benefit: it decreases the number of traces built, thus reducing power consumption while improving overall performance. An extension of the filtering concept involves adding a second level (L2) trace-cache that stores less frequent traces that are replaced in the FTC or the MTC. The extra level of caching allows for order-of-magnitude reduction in the number of trace builds. Second level trace cache proves particularly useful for applications with large instruction footprints.
symposium on code generation and optimization | 2010
Edson Borin; Youfeng Wu; Cheng Wang; Wei Liu; Mauricio Breternitz; Shiliang Hu; Esfir Natanzon; Shai Rotem; Roni Rosner
Dynamic binary translation is a key component of Hardware/Software (HW/SW) co-design, which is an enabling technology for processor microarchitecture innovation. There are two well-known dynamic binary optimization techniques based on atomic execution support. Frame-based optimizations leverage processor pipeline support to enable atomic execution of hot traces. Region level optimizations employ transactional-memory-like atomicity support to aggressively optimize large regions of code. In this paper we propose a two-level atomic optimization scheme which not only overcomes the limitations of the two approaches, but also boosts the benefits of the two approaches effectively. Our experiment shows that the combined approach can achieve a total of 21.5% performance improvement over an aggressive out-of-order baseline machine and improve the performance over the frame-based approach by an additional 5.3%.
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems | 2003
Roni Rosner; Yoav Almog; Micha Moffie; Naftali Schwartz; Avi Mendelson
We present the PARROT concept aimed at both higher performance and power-awareness. The PARROT microarchitectural framework integrates trace caching, dynamic optimizations and pipeline decoupling. We employ a gradual and selective approach for applying complex mechanisms only for the most frequently used traces to maximize the performance gain at any given power constraint, thus attaining finer control of tradeoffs between performance and power awareness. We show that the PARROT microarchitecture delivers performance increases comparable to those available through conventional doubling of execution resources (average 16% IPC improvement). This improvement comes through better utilization of all available resources with the combination of a trace cache and selective trace optimization. On the other hand, performance advantage of a trace cache alone is limited to wide-machine configurations. No less critical, however, is power awareness. The PARROT microarchitecture delivers the performance increase at a comparable energy level, whereas the conventional path to higher performance consumes an average 70% more energy. Meanwhile, for those designs which can tolerate a higher power budget, PARROT gracefully scales up to use additional execution resources in a uniformly efficient manner. In particular, a PARROT-style doubly-wide machine delivers an average 45% IPC improvement while actually improving the Cubic-MIPS-per-WATT power awareness metric by over 50%.
Archive | 2004
Yoav Almog; Roni Rosner; Ronny Ronen
Archive | 2001
Abraham Mendelson; Roni Rosner; Ronny Ronen
Archive | 2002
Roni Rosner; Abraham Mendelson
Archive | 2004
Satish Narayanasamy; Hong Wang; John Paul Shen; Roni Rosner; Yoav Almog; Naftali Schwartz; Gerolf F. Hoflehner; Daniel M. Lavery; Wei Li; Xinmin Tian; Milind Girkar; Perry H. Wang
Archive | 2001
Roni Rosner; Micha Moffie; Abraham Mendelson