Pedro Chaparro
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pedro Chaparro.
IEEE Transactions on Parallel and Distributed Systems | 2007
Pedro Chaparro; José González; Grigorios Magklis; Cai Qiong; Antonio González
Multicore architectures are becoming the main design paradigm for current and future processors. The main reason is that multicore designs provide an effective way of overcoming instruction-level parallelism (ILP) limitations by exploiting thread-level parallelism (TLP). In addition, it is a power and complexity-effective way of taking advantage of the huge number of transistors that can be integrated on a chip. On the other hand, todays higher than ever power densities have made temperature one of the main limitations of microprocessor evolution. Thermal management in multicore architectures is a fairly new area. Some works have addressed dynamic thermal management in bi/quad-core architectures. This work provides insight and explores different alternatives for thermal management in multicore architectures with 16 cores. Schemes employing both energy reduction and activity migration are explored and improvements for thread migration schemes are proposed.
international conference on parallel architectures and compilation techniques | 2008
Qiong Cai; Jose Gonzalez; Ryan N. Rakvic; Grigorios Magklis; Pedro Chaparro; Antonio González
We present a novel mechanism, called meeting point thread characterization, to dynamically detect critical threads in a parallel region. We define the critical thread the one with the longest completion time in the parallel region. Knowing the criticality of each thread has many potential applications. In this work, we propose two applications: thread delaying for multi-core systems and thread balancing for simultaneous multi-threaded (SMT) cores. Thread delaying saves energy consumptions by running the core containing the critical thread at maximum frequency while scaling down the frequency and voltage of the cores containing non-critical threads. Thread balancing improves overall performance by giving higher priority to the critical thread in the issue queue of an SMT core. Our experiments on a detailed microprocessor simulator with the Recognition, Mining, and Synthesis applications from Intel research laboratory reveal that thread delaying can achieve energy savings up to more than 40% with negligible performance loss. Thread balancing can improve performance from 1% to 20%.
international symposium on microarchitecture | 2009
Jaume Abella; Javier Carretero; Pedro Chaparro; Xavier Vera; Antonio González
Transistors per area unit double in every new technology node. However, the electric field density and power demand grow if Vcc is not scaled. Therefore, Vcc must be scaled in pace with new technology nodes to prevent excessive degradation and keep power demand within reasonable limits. Unfortunately, low Vcc operation exacerbates the effect of variations and decreases noise and stability margins, increasing the likelihood of errors in SRAM memories such as caches. Those errors translate into performance loss and performance variation across different cores, which is especially undesirable in a multi-core processor. This paper presents (i) a novel scheme to tolerate high faulty bit rates in caches by disabling only faulty subblocks, (ii) a dynamic address remapping scheme to reduce performance variation across different cores, which is key for performance predictability, and (iii) a comparison with state-of-the-art techniques for faulty bit tolerance in caches. Results for some typical first level data cache configurations show 15% average performance increase and standard deviation reduction from 3.13% down to 0.55% when compared to cache line disabling schemes.
international conference on computer design | 2004
Pedro Chaparro; Jose Gonzalez; Antonio González
As frequencies and feature size scale faster than operating voltages, power density is increasing in each processor generation. Power density and the cost of removing the heat it generates are increasing at the same rate. Leakage is significantly increasing every process generation and it is expected to be the main source of power in the near future. Moreover, leakage power grows exponentially with temperature. This paper proposes and evaluates several techniques with two goals: reduction of average temperature in order to decrease leakage power, and reduction of peak temperature in order to reduce cooling cost. Combinations of temperature-aware steering techniques and cluster hopping are investigated in a quad-cluster superscalar microarchitecture. Combining cluster hopping with a temperature-aware steering policy results in 30% reduction in leakage power and 8% reduction in average peak temperature at the expense of a slowdown of just 5%.
high-performance computer architecture | 2005
Pedro Chaparro; Grigorios Magklis; Jose Gonzalez; Antonio González
Due to increasing power densities, both on-chip average and peak temperatures are fast becoming a serious bottleneck in processor design. This is due to the cost of removing the heat generated, and the performance impact of dealing with thermal emergencies. So far microarchitectural techniques to control temperature have mainly focused on the processor backend (in particular the execution units), whereas the frontend has not received much attention. However, as the temperature of the backend remains controlled and the processor throughput increases, the heat dissipated by the frontend becomes more significant, and one of the major contributors to the total average temperature. This paper proposes and evaluates a distributed frontend for clustered microarchitectures that is able to reduce power density and temperature. First, a distributed mechanism for renaming and committing instructions is proposed. Second, a sub-banked trace cache with a bank hopping mechanism is presented. Finally, a method to improve the sub-banking is proposed based on a biased mapping function to distribute bank accesses to balance temperature.
international symposium on low power electronics and design | 2006
Grigorios Magklis; Pedro Chaparro; Jose Gonzalez; Antonio González
In recent years, globally asynchronous locally synchronous (GALS) designs and dynamic voltage scaling (DVS) have emerged as some of the most popular approaches to address the ever increasing microprocessor energy consumption. In this work, we propose two on-line algorithms for adjusting dynamically, and independently, the voltage and frequency of the front-end and back-end domains of a novel two-domain microprocessor. We evaluate our mechanisms for both internal and external voltage regulators, and we present optimal dynamic voltage scaling results for the proposed microarchitecture. Our schemes achieve average improvement of 12% of the energy-delay metric, when using internal voltage regulators
parallel, distributed and network-based processing | 2010
Pedro Chaparro; Jesus Alcober; Janio M. Monteiro; Carlos Miguel Tavares Calafate; Juan-Carlos Cano; Pietro Manzoni
Emerging multimedia applications over mobile devices are becoming very popular, especially over infrastructure wireless networks such as cellular and WLANs. However, providing this kind of services over infrastructure-less networks like ad hoc networks presents many additional problems. One of these problems is how to share resources fairly among the users involved. In this article we propose a QoS framework supporting scalable video streaming in mobile ad hoc networks based on distributed admission control and video traffic awareness. Our framework promotes fairness between video flows in terms of resource consumption. It also guarantees a significant reduction of the idle times experienced by users during periods of network saturation, thus increasing the video playout time in reception for all users. Using the IEEE 802.11e MAC technology as our basis for traffic differentiation, our framework, called DACME-SV (Distributed Admission Control for MANETs - Scalable Video), relies on a periodic probing process to measure the available bandwidth and the end-to-end delay on the path. DACME-SV adopts a cross-layer approach to determine the optimum number of video layers to transmit at any given time, thus avoiding network congestion and guaranteeing an acceptable video quality at the destination. Experimental results show that idle time periods are substantially decreased, while exhibiting a good overall performance in terms of throughput and delay.
international symposium on computer architecture | 2009
Javier Carretero; Pedro Chaparro; Xavier Vera; Jaume Abella; Antonio González
While Moores Law predicts the ability of semiconductor industry to engineer smaller and more efficient transistors and circuits, there are serious issues not contemplated in that law. One concern is the verification effort of modern computing systems, which has grown to dominate the cost of system design. On the other hand, technology scaling leads to burn-in phase out. As a result, in-the-field error rate may increase due to both actual errors and latent defects. Whereas data can be protected with arithmetic codes, there is a lack of cost-effective mechanisms for control logic. This paper presents a light-weight microarchitectural mechanism that ensures that data consumed through registers are correct. The structures protected include the issue queue logic and the data associated (i.e., tags and control signals), input multiplexors, rename data, replay logic, register free-list and release logic, and register file logic. Our results show a coverage around 90 percent for the targeted structures with a cost in power and area of about four percent, and without impact in performance.
international symposium on low power electronics and design | 2009
Pedro Chaparro; Jose Gonzalez; Qiong Cai; Greg Chrysler
Multi-core architectures require Dynamic Thermal Management mechanisms (DTM) to handle (1) multiple hotspots and (2) global chip heating effect while finding the best trade-off between performance and thermal control. In that scenario Thin-Film Thermoelectric Cooling devices can be used to mitigate both effects since they provide on-die localized cooling with a dynamic and heterogeneous effect. This work proposes controlling TFTECs from the microarchitecture for an enhanced Dynamic Thermal Management in multi-core architectures. We show that by using our TFTEC-based proposals the performance is within 8% of that of a thermally-unconstrained processor.
international on line testing symposium | 2008
Jaume Abella; Pedro Chaparro; Xavier Vera; Javier Carretero; Antonio González
Technology scaling leads to burn-in phase out and increasing post-silicon test complexity, which increases in-the-field error rate due to both latent defects and actual errors. As a consequence, there is an increasing need for continuous on-line testing techniques to cope with hard errors in the field. Similarly, those techniques are needed for detecting soft errors in logic, whose error rate is expected to raise in future technologies. Cache memories, which occupy most of the area of the chip, are typically protected with parity or ECC, but most of the wires as well as some combinational blocks remain unprotected against both soft and hard errors. This paper presents a set of techniques to detect and confine hard and soft errors in cache memories in combination with parity/ECC at very low cost. By means of hard signatures in data rows and error tracking, faults can be detected, classified properly and confined for hardware reconfiguration.