Kyriakos Stavrou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyriakos Stavrou is active.

Explore More

Publication

Featured researches published by Kyriakos Stavrou.

architectural support for programming languages and operating systems | 2014

Speculative hardware/software co-designed floating-point multiply-add fusion

Marc Lupon; Enric Gibert; Grigorios Magklis; Sridhar Samudrala; Raúl Martínez; Kyriakos Stavrou; David R. Ditzel

A Fused Multiply-Add (FMA) instruction is currently available in many general-purpose processors. It increases performance by reducing latency of dependent operations and increases precision by computing the result as an indivisible operation with no intermediate rounding. However, since the arithmetic behavior of a single-rounding FMA operation is different than independent FP multiply followed by FP add instructions, some algorithms require significant revalidation and rewriting efforts to work as expected when they are compiled to operate with FMA--a cost that developers may not be willing to pay. Because of that, abundant legacy applications are not able to utilize FMA instructions. In this paper we propose a novel HW/SW collaborative technique that is able to efficiently execute workloads with increased utilization of FMA, by adding the option to get the same numerical result as separate FP multiply and FP add pairs. In particular, we extended the host ISA of a HW/SW co-designed processor with a new Combined Multiply-Add (CMA) instruction that performs an FMA operation with an intermediate rounding. This new instruction is used by a transparent dynamic translation software layer that uses a speculative instruction-fusion optimization to transform FP multiply and FP add sequences into CMA instructions. The FMA unit has been slightly modified to support both single-rounding and double-rounding fused instructions without increasing their latency and to provide a conservative fall-back path in case of mispeculation. Evaluation on a cycle-accurate timing simulator showed that CMA improved SPECfp performance by 6.3% and reduced executed instructions by 4.7%.

symposium on code generation and optimization | 2014

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Aleksandar Branković; Kyriakos Stavrou; Enric Gibert; Antonio González

Evaluation techniques in microprocessor design are mostly based on simulating selected application samples using a cycle-accurate simulator. In order to achieve accurate results, microarchitectural structures are warmed-up for a few million instructions prior to statistics collection. Unfortunately, this strategy cannot be applied to HW/SW co-designed processors, in which a Transparent Optimization software Layer (TOL) translates and optimizes code on-the-fly from a guest ISA to an internal host custom microarchitecture. We show that the warm-up period in this case needs to be 3-4 orders of magnitude longer than what is needed for traditional microprocessor designs because the TOL state needs to be warmed-up as well. In this paper, we propose a novel simulation technique for HW/SW co-designed processors based on adapting the optimization promotion thresholds using high level application statistics in order to find the best trade-off between accuracy and simulation cost. In particular, the proposed technique reduces the simulation cost by 65X with an average error of just 0.75%. Furthermore, as opposed to other alternatives, the proposed technique satisfies the additional requirement of allowing evaluation using different TOL and microarchitectural configurations.

computing frontiers | 2013

Performance analysis and predictability of the software layer in dynamic binary translators/optimizers

Aleksandar Branković; Kyriakos Stavrou; Enric Gibert; Antonio González

Dynamic Binary Translators and Optimizers (DBTOs) have been established as a common architecture during the last years. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed microarchitectures. Although many researchers worked on characterizing and reducing the emulation overhead, there are no published results that explain how the DBTO behaves from the microarchitectural prospective and how its behavior may be predicted based on high-level, guest application statistics. Such results are important for guiding design decisions and system optimization. In this paper we study the DBTO as an independent application by dividing its functionality into modules. We show that the behavior of the DBTO is not constant at all. The contribution of the different modules in the total overhead, the overhead itself, the microarchitectural interaction with the emulated application and the microarchitectural profile of the different modules changes significantly based on the emulated application. This result comes in contrast to numerous papers that consider this behavior constant and exclude the DBTO from the simulation. Throughout this paper we detail this variance, we quantify it and we explain the reasons behind it. The insights presented in this work can be exploited towards the design of more efficient DBTOs and their early performance evaluation.

computing frontiers | 2014

Accurate off-line phase classification for HW/SW co-designed processors

Aleksandar Branković; Kyriakos Stavrou; Enric Gibert; Antonio González

Archive | 2014

IMAGE SIGNAL PROCESSOR WITH A BLOCK CHECKING CIRCUIT

Kyriakos Stavrou; Pedro Marcuello; Grigorios Magklis; Javier Carretero Casado; Juan Fernández; Carlos Madriles; Daniel Ortega; Demos Pavlou

Archive | 2013

Partial commits in dynamic binary translation based systems

Raúl Martínez; Enric Gibert Codina; Marc Lupon; Kyriakos Stavrou

Archive | 2013

MECHANISM FOR FACILITATING DYNAMIC AND EFFICIENT FUSION OF COMPUTING INSTRUCTIONS IN SOFTWARE PROGRAMS

Marc Lupon; Raúl Martínez; Enric Gibert Codina; Kyriakos Stavrou; Grigorios Magklis; Sridhar Samudrala

Archive | 2017

MXCSR control method and apparatus

Grigorios Magklis; Josep M. Codina; Craig B. Zilles; Michael Neilly; Sridhar Samudrala; Alejandro Martinez Vicente; Polychronis Xekalakis; F. Jesús Sánchez; Marc Lupon; Georgios Tournavitis; Enric Gibert Codina; Crispin Gomez Requena; Antonio González; Mirem Hyuseinova; Christos E. Kotselidis; Fernando Latorre; Pedro Lopez; Carlos Madriles Gimeno; Pedro Marcuello; Raúl Martínez; Daniel Ortega; Demos Pavlou; Kyriakos Stavrou

Archive | 2014