Román Hermida
Complutense University of Madrid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Román Hermida.
design, automation, and test in europe | 2005
N. Genko; David Atienza; G. De Micheli; José M. Mendías; Román Hermida; Francky Catthoor
Current systems-on-chip (SoC) execute applications that demand extensive parallel processing. Networks-on-chip (NoC) provide a structured way of realizing interconnections on silicon, and obviate the limitations of bus-based solutions. NoC can have regular or ad hoc topologies, and functional validation is essential to assess their correctness and performance. In this paper, we present a flexible emulation environment implemented on an FPGA that is suitable to explore, evaluate and compare a wide range of NoC solutions with a very limited effort. Our experimental results show a speed-up of four orders of magnitude with respect to cycle-accurate HDL simulation, while retaining cycle accuracy. With our emulation framework, designers can explore and optimize a various range of solutions, as well as characterize quickly performance figures.
IEEE Transactions on Very Large Scale Integration Systems | 2001
Rafael Maestre; Fadi J. Kurdahi; Milagros Fernández; Román Hermida; Nader Bagherzadeh; Hartej Singh
Dynamically reconfigurable architectures are emerging as a viable design alternative to implement a wide range of computationally intensive applications. At the same time, an urgent necessity has arisen for support tool development to automate the design process and achieve optimal exploitation of the architectural features of the system. Task scheduling and context (configuration) management become very critical issues in achieving the high performance that digital signal processing (DSP) and multimedia applications demand. This article proposes a strategy to automate the design process which considers all possible optimizations that can be carried out at compilation time, regarding context and data transfers. This strategy is general in nature and could be applied to different reconfigurable systems. We also discuss the key aspects of the scheduling problem in a reconfigurable architecture such as MorphoSys. In particular, we focus on a task scheduling methodology for DSP and multimedia applications, as well as the context management and scheduling optimizations.
design, automation, and test in europe | 1999
Rafael Maestre; Fadi J. Kurdahi; Nader Bagherzadeh; Hartej Singh; Román Hermida; Milagros Fernández
Reconfigurable computing is a flexible way of facing with a single device a wide range of applications with a good level of performance. This area of computing involves different issues and concepts when compared with conventional computing systems. One of these concepts is context lending. The context refers to the coded configuration information to implement a particular circuit behaviour. An important problem for reconfigurable computing is the scheduling of a group of kernels (sub-tasks) that constitute a complex application for minimum execution time. In this paper, we show how the different execution orders for these sub-tasks may result in varying levels of performance. We formulate an analytical approach and present a solution for this new problem through this work.
ACM Transactions on Design Automation of Electronic Systems | 2004
Juan de Vicente; Juan Lanchares; Román Hermida
Placement is key issue of integrated circuit physical design. There exist some techniques inspired in thermodynamics coping with this problem as Simulated Annealing. In this article, we present a combinatorial optimization method directly derived from both Thermodynamics and Information Theory. In TCO (Thermodynamic Combinatorial Optimization), two kinds of processes are considered: microstate and macrostate transformations. Applying the Shannons definition of entropy to reversible microstate transformations, a probability of acceptance based on Fermi--Dirac statistics is derived. On the other hand, applying thermodynamic laws to macrostate transformations, an efficient annealing schedule is provided. TCO has been compared with a custom Simulated Annealing (SA) tool on a set of benchmark circuits for the FPGA (Field Programmable Gate Arrays) placement problem. TCO has provided the high-quality results of SA, while inheriting the adaptive properties of Natural Optimization (NO).
ACM Transactions on Design Automation of Electronic Systems | 2007
David Atienza; Pablo García Del Valle; Giacomo Paci; Francesco Poletti; Luca Benini; Giovanni De Micheli; José M. Mendías; Román Hermida
New tendencies envisage multiprocessor systems-on-chips (MPSoCs) as a promising solution for the consumer electronics market. MPSoCs are complex to design, as they must execute multiple applications (games, video) while meeting additional design constraints (energy consumption, time-to-market). Moreover, the rise of temperature in the die for MPSoCs can seriously affect their final performance and reliability. In this article, we present a new hardware-software emulation framework that allows designers a complete exploration of the thermal behavior of final MPSoC designs early in the design flow. The proposed framework uses FPGA emulation as the key element to model hardware components of the considered MPSoC platform at multimegahertz speeds. It automatically extracts detailed system statistics that are used as input to our software thermal library running in a host computer. This library calculates at runtime the temperature of on-chip components, based on the collected statistics from the emulated system and final floorplan of the MPSoC. This enables fast testing of various thermal management techniques. Our results show speedups of three orders of magnitude compared to cycle-accurate MPSoC simulators.
Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204) | 1998
J. de Vincente; Juan Lanchares; Román Hermida
The work combines FPGA placement and global routing phases in a single phase, taking advantage of the interrelations between them both. The authors have developed rectilinear Steiner regions (RSR), a new fast algorithm to approximate the rectilinear Steiner minimum tree (RSMT) of each multi-terminal net. The search of placement solutions is performed in three simulated annealing optimization phases, guided by different objective functions. The first one uses a semi-perimeter classic metric to reduce the length of the nets. The second one estimates more precisely the length of the nets with RSR algorithm. The third stage measures the congestion making a fast routing of RSR regions in each placement iteration. They have also developed an RSR-based global router. This optimization method has been applied for the placement and global routing of a set of benchmark circuits. The layouts obtained, require equal or fewer routing tracks per channel segment than those produced by other tools appeared in the literature, that only optimize the semi-perimeter classic placement cost function.
international symposium on circuits and systems | 2005
N. Genko; David Atienza; G. De Micheli; Luca Benini; José M. Mendías; Román Hermida; F. Catthoor
Current systems-on-chip execute applications that demand extensive parallel processing. Networks-on-chip (NoC) provide a structured way of realizing interconnections on silicon, and obviate the limitations of bus-based solutions. NoCs can have regular or ad hoc topologies, and functional validation is essential to assess their correctness and performance. In this paper, we present a flexible emulation environment implemented on an FPGA that is suitable to explore, evaluate and compare a wide range of NoC solutions with a very limited effort. Our experimental results show a speed-up of four orders of magnitude with respect to cycle-accurate HDL simulation, while retaining cycle accuracy. With our emulation framework, designers can explore and optimize a range of solutions, as well as characterize quickly performance figures.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006
María Molina; Rafael Ruiz-Sautua; José M. Mendías; Román Hermida
Conventional scheduling algorithms try to balance the number of operations of every different type executed per cycle. However, in most cases, a uniform distribution is not reachable, and thus, some hardware (HW) waste appears. This situation becomes worse when heterogeneous specifications (those formed by operations with different data formats and widths) are synthesized. Our proposal is an innovative bit-level algorithm able to minimize this HW waste. In order to obtain uniform distributions of the computational cost of operations among cycles, it successively transforms specification operations into sets of smaller ones, which are then scheduled independently. As a consequence, some specification operations may be executed during a set of nonconsecutive cycles, and over several functional units. In combination with allocation algorithms able to guarantee the bit-level reuse of HW resources, our approach produces circuits with substantially smaller area than conventional implementations. Due to the fragmentation of operations, in the proposed implementations, the type, number, and width of HW resources are, in general, independent of the type, number, and width of both specification operations and variables. Additionally, the clock-cycle length is also reduced in most circuits.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011
A. A. Del Barrio; Seda Ogrenci Memik; María Molina; José M. Mendías; Román Hermida
Speculative functional units (SFUs) are arithmetic functional units that operate using a predictor for the carry signal. The carry prediction helps to shorten the critical path of the functional unit. The average case performance of these units is determined by the hit rate of the prediction. In case of mispredictions, the SFUs need to be coordinated by the datapath control mechanism to perform corrections and to maintain the datapath in the correct state. Devising a control mechanism for correcting mispredictions without adversely impacting overall performance is the most important challenge. In this paper, we present techniques for designing a datapath controller for seamless deployment of SFUs in high level synthesis. We have developed two techniques based on two main control paradigms: centralized and distributed control. The centralized approach stops the execution of the entire datapath for each misprediction and resumes execution once the correct value of the carry is known. The distributed approach decouples the functional unit suffering from the misprediction from the rest of the datapath. Hence, it allows the remainder of the functional units to carry on execution and be at different scheduling states at different times. We tested datapaths utilizing both linear structures and logarithmic structures for speculative arithmetic functional units. Our results show that it is possible to reduce execution time by as much as 38% (33% on average) for linear structures and by as much as 37.2% (25% on average) for logarithmic structures.
international symposium on systems synthesis | 1999
Rafael Maestre; Milagros Fernández; Román Hermida; Nader Bagherzadeh
Reconfigurable computing is emerging as a viable design alternative to implement a wide range of computationally intensive applications. The scheduling problem becomes a really critical issue in achieving the high performance that these kind of applications demand. The paper describes the different aspects regarding the scheduling problem in a reconfigurable architecture. We also propose a general strategy in order to perform at compilation time a scheduling that includes all possible optimizations regarding context (configuration) and data transfers. In particular, we focus especially on the methodology and mechanisms to solve the context scheduling. Some experimental results are presented to validate our assumptions. Finally, the problem of data transfers is formulated, to be addressed in future work.