José Manuel Colmenar
Complutense University of Madrid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José Manuel Colmenar.
Microprocessors and Microsystems | 2009
José Manuel Colmenar; Oscar Garnica; Juan Lanchares; José Ignacio Hidalgo
Asynchronous systems are attracting the interest of the designer community because of several useful features for sub-micron technologies: process-variation tolerant, low-power, removal of the clock tree generation, etc. One of the main problems for the simulation of these systems is the variable computation delays of their modules, that compute as fast as possible under the actual conditions of the system. This behavior complicates the high-level simulation of such systems and it is the main reason for the lack of simulation tools devoted to asynchronous microarchitectures. In this paper we present a modeling method useful for this kind of systems that describes the variable computation delay of an asynchronous circuit by using probability distribution functions. This method is deployed in an architectural simulator of a 64-bit superscalar asynchronous microarchitecture where the computation delay of each one of the modules of the microarchitecture was characterized through a probability distribution function. The experimental results show that the asynchronous behavior is successfully modeled, and the architectural simulations of standard benchmarks is affordable in terms of wall-clock simulation time.
international conference on parallel processing | 2006
José Manuel Colmenar; Oscar Garnica; Juan Lanchares; José Ignacio Hidalgo; Guadalupe Miñana; Sonia Martín López
In this paper we present sim-async, an architectural simulator able to model a 64-bit asynchronous superscalar microarchitecture. The aim of this tool is to serve the designers on the study of different architectural proposals for asynchronous processors. Sim-async models the data-dependant timing of the processor modules by using distribution functions that describe the probability of a given delay to be spent on a computation. This idea of characterizing the timing of the modules at the architectural level of abstraction using distribution functions is introduced for the first time with this work. In addition, sim-async models the delays of all the relevant hardware involved in the asynchronous communication between stages. To tackle the development of sim-async we have modified the source code of SimpleScalar by substituting the simulators core with our own execution engine, which provides the functionality of a parameterizable microarchitecture adapted to the Alpha ISA. The correctness of sim-async was checked by comparing the outputs of the SPEC2000 benchmarks with SimpleScalar executions, and the asynchronous behavior was successfully tested in relation to a synchronous configuration of sim-async.
Iet Computers and Digital Techniques | 2007
Guadalupe Miñana; José Ignacio Hidalgo; Juan Lanchares; José Manuel Colmenar; Oscar Garnica; Sonia Martín López
A hardware technique to reduce static and dynamic power consumption in functional units of 64-bit high-performance processors is presented here. The instructions that require an adder have been studied it can be concluded and that, there is a large percentage of instruction where one of the two source operands is always narrow and does not require a 64-bit adder. Furthermore, by analysing the executed applications, it is feasible to classify their internal operations according to their bit-width requirements and select the appropriate adder type that each instruction requires. This approach is based on substituting some of the 64-bit power-hungry adders with 32-bit ones, which consume much lower power, and modifying the protocol to issue as much instructions as possible to these low power consumption units, while incurring in negligible performance penalties. Five different configurations were tested for the execution units. Results indicate that this technique can save between up to 50% of the power consumed by the adders and up to 21% of the overall power consumption in the execution unit of high-performance architectures. Moreover, the simulations show good results in terms of power efficiency (IPC/W) and it can be affirmed that it could prevent the creation of hot spots in the functional units.
digital systems design | 2006
Guadalupe Miñana; Oscar Garnica; José Ignacio Hidalgo; Juan Lanchares; José Manuel Colmenar
This paper presents a hardware technique to reduce the static and dynamic power consumption in functional units of a 64-bit superscalar processor. Our approach is based on substituting some of the 64-bit power-hungry adders by others with 32-bit lower power-consumption adders, and modifying the protocol in order to issue as much instructions as possible to those low power-consumption units incurring a negligible performance penalty. Our technique saves between 14.7% and a 50% of the power-consumption in the adders which is between 6.1% and a 20% of power-consumption in the execution units. This reduction is important because it can avoid the creation of a hot spot on the functional units
digital systems design | 2006
José Manuel Colmenar; Oscar Garnica; Juan Lanchares; José Ignacio Hidalgo; Guadalupe Miñana; Sonia Martín López
Nowadays, synchronous processor designers have to deal with severe problems related to the distribution of a complex clock network like skew reduction, high power-consumption, synchronization of clocks, etc. Asynchronous or self-timed architectures are becoming an interesting design alternative because they usually avoid these drawbacks, and they are able to achieve high performance at a low power consumption cost. However, on the first steps of the design process, the evaluation of the performance of such architectures through simulations is much more complicated due to the requirement of modeling the data-dependant timing of each system module. The aim of this paper is to evaluate the performance of a 64-bit fully-asynchronous superscalar processor microarchitecture with dynamically scheduled instruction flow, out-of-order speculative execution of instructions and advanced branch prediction. To tackle this goal we have described the asynchronous microarchitecture solving the synchronization between structures through a four-phase handshake protocol. Then, we have used a modification of the SimpleScalar suite to model the asynchronous microarchitecture in order to run Alpha programs on it. Finally, we have compared the performance of this fully-asynchronous processor with the performance obtained from its synchronous counterpart by running architectural simulations of the SPEC2000 benchmarks on both models
power and timing modeling optimization and simulation | 2005
Guadalupe Miñana; Oscar Garnica; José Ignacio Hidalgo; Juan Lanchares; José Manuel Colmenar
This paper presents a hardware technique to reduce of static and dynamic power consumption in FUs. This approach entails substituting some of the power-hungry adders of a 64-bit superscalar processor, by others with lower power-consumption, and modifying the slot protocol in order to issue as much instructions as possible to those low power consumption units incurring marginal performance penalties. Our proposal saves between a 2% and a 45% of power-performance in FUs and between a 16% and a 65% of power-consumption in adders.
power and timing modeling optimization and simulation | 2004
Sonia Martín López; Oscar Garnica; José Manuel Colmenar
This paper proposes a new approach for improving the performance of Globally Asynchronous Locally Synchronous (GALS) circuits. This approach takes advantage of the delay dependence of the input vectors to classify input data into several classes. Each class has a clock period associated, in such a way that a suitable clock is selected for each data. This technique has been applied to a GALS pipelined RISC processor based on DLX processor. Several programs were run over this processor performing different classifications, in order to check the viability of this new approach.
parallel, distributed and network-based processing | 2004
José Manuel Colmenar; Oscar Garnica; Sonia Martín López; José Ignacio Hidalgo; Juan Lanchares; Román Hermida
The paper has two aims: on one hand, to characterize the nature of the relationship between the latency of an asynchronous pipeline and the stage latencies when the latency of the stages is data-dependent. On the other hand, to find the closed-form expression which relates the mean value of the latency of the pipeline with the parameters which characterize the behaviour of the constituting stages. To attain these two goals, we have followed an empirical approach; we have developed a model of an asynchronous pipeline with n stages, the latency of the stages has been modelled using a probability density function and the pipeline behaviour has been simulated. From the results, we have defined linear equations which estimate the mean latency of the pipeline without needs of simulation. Finally, we have designed and implemented a 32/spl times/32 bit asynchronous pipelined multiplier, and we have tested the estimations on it.
parallel, distributed and network-based processing | 2008
José Manuel Colmenar; N. Moron; Oscar Garnica; Juan Lanchares; José Ignacio Hidalgo
Asynchronous systems are attracting the interest of a growing number of designers. However, the lack of simulation tools devoted to asynchronous microarchitectures is a gap that is not narrowed today. One of the main obstacles on the simulation of asynchronous systems is the variable computation delays of their modules, which compute as fast as possible under the actual conditions of the system because there is no clock signal. In this paper we present a modelling method that describes the variable computation delay of an asynchronous circuit by using probability distribution functions that return the probability of a given delay to be spent on the computation of a data. This method was integrated in an architectural simulator of a 64-bit superscalar asynchronous microarchitecture where the computation delay of each one of the modules of the microarchitecture was characterized through a probability distribution function. The experimental results showed that the asynchronous behavior was successfully modeled, and the architectural simulations of standard benchmarks were affordable in terms of wall-clock simulation time.
power and timing modeling optimization and simulation | 2006
Guadalupe Miñana; José Ignacio Hidalgo; Oscar Garnica; Juan Lanchares; José Manuel Colmenar; Sonia Martín López
This paper presents a hardware technique to reduce both the static and dynamic power consumption in Functional Units of a 64-bit superscalar processor. We have studied the instructions that require an adder and we can conclude that, in 64-bit processors, there are many instructions that do not require a 64-bit adder, and that by knowing the type of operation we can also know what adder type this instruction requires. This is due that there are some types of instruction where one of the two source operands is always narrow. Our approach is based on substituting some of the 64-bit power-hungry adders by others of 32-bit and 24-bits lower power-consumption adders, and modifying the protocol in order to issue as much instructions as possible to those low power-consumption units incurring in a negligible performance penalty. We have tested four different configurations for the execution units in order to find which one obtains a higher reduction on power-consumption, preserving the performance of the processor. Our technique saves between 38,8% and a 54,1% of the power-consumption in the adders which is between 16,6% and a 23,1% of power-consumption in the execution units. This reduction is important because it can avoid the creation of a hot spot on the functional units.