Is this you? Create Your Porfile

Salvador Petit

Polytechnic University of Valencia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Salvador Petit is active.

Explore More

Publication

Featured researches published by Salvador Petit.

symposium on computer architecture and high performance computing | 2007

Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors

Rafael Ubal; Julio Sahuquillo; Salvador Petit; Pedro López

Current microprocessors are based in complex designs, integrating different components on a single chip, such as hardware threads, processor cores, memory hierarchy or interconnection networks. The permanent need of evaluating new designs on each of these components motivates the development of tools which simulate the system working as a whole. In this paper, we present the Multi2Sim simulation framework, which models the major components of incoming systems, and is intended to cover the limitations of existing simulators. A set of simulation examples is also included for illustrative purposes.

international parallel and distributed processing symposium | 2008

A simple power-aware scheduling for multicore systems when running real-time applications

Diana Bautista; Julio Sahuquillo; Houcine Hassan; Salvador Petit; José Duato

High-performance microprocessors, e.g., multithreaded and multicore processors, are being implemented in embedded real-time systems because of the increasing computational requirements. These complex microprocessors have two major drawbacks when they are used for real-time purposes. First, their complexity difficults the calculation of the WCET (worst case execution time). Second, power consumption requirements are much larger, which is a major concern in these systems. In this paper we propose a novel soft power-aware real-time scheduler for a state-of-the-art multicore multithreaded processor, which implements dynamic voltage scaling techniques. The proposed scheduler reduces the energy consumption while satisfying the constraints of soft real-time applications. Different scheduling alternatives have been evaluated, and experimental results show that using a fair scheduling policy, the proposed algorithm provides, on average, energy savings ranging from 34% to 74%.

computing frontiers | 2005

Exploiting temporal locality in drowsy cache policies

Salvador Petit; Julio Sahuquillo; Jose M. Such; David R. Kaeli

Technology projections indicate that static power will become a major concern in future generations of high-performance microprocessors. Caches represent a significant percentage of the overall microprocessor die area. Therefore, recent research has concentrated on the reduction of leakage current dissipated by caches. The variety of techniques to control current leakage can be classified as non-state preserving or state preserving. Non-state preserving techniques power off selected cache lines while state preserving place selected lines into a low-power state. Drowsy caches are a recently proposed state-preserving technique. In order to introduce low performance overhead, drowsy caches must be very selective on which cache lines are moved to a drowsy statePast research on cache organization has focused on how best to exploit the temporal locality present in the data stream. In this paper we propose a novel drowsy cache policy called Reuse Most Recently used On (RMRO), which makes use of reuse information to trade off performance versus energy consumption. Our proposal improves the hit ratio for drowsy lines by about 67%, while reducing the power consumption by about 11.7% (assuming 70nm technology) with respect to previously proposed drowsy cache policies.

international symposium on microarchitecture | 2009

An hybrid eDRAM/SRAM macrocell to implement first-level data caches

Alejandro Valero; Julio Sahuquillo; Salvador Petit; Vicente Lorente; Ramon Canal; Pedro López; José Duato

SRAM and DRAM cells have been the predominant technologies used to implement memory cells in computer systems, each one having its advantages and shortcomings. SRAM cells are faster and require no refresh since reads are not destructive. In contrast, DRAM cells provide higher density and minimal leakage energy since there are no paths within the cell from Vdd to ground. Recently, DRAM cells have been embedded in logic-based technology, thus overcoming the speed limit of typical DRAM cells. In this paper we propose an n-bit macrocell that implements one static cell, and n-1 dynamic cells. This cell is aimed at being used in an n-way set-associative first-level data cache. Our study shows that in a four-way set-associative cache with this macrocell compared to an SRAM based with the same capacity, leakage is reduced by about 75% and area more than half with a minimal impact on performance. Architectural mechanisms have also been devised to avoid refresh logic. Experimental results show that no performance is lost when the retention time is larger than 50 K processor cycles. In addition, the proposed delayed writeback policy that avoids refreshing performs a similar amount of writebacks than a conventional cache with the same organization, so no power wasting is incurred.

Concurrency and Computation: Practice and Experience | 2013

Power‐aware scheduling with effective task migration for real‐time multicore embedded systems

José Luis March; Julio Sahuquillo; Salvador Petit; Houcine Hassan; José Duato

A major design issue in embedded systems is reducing the power consumption because batteries have a limited energy budget. For this purpose, several techniques such as dynamic voltage and frequency scaling (DVFS) or task migration are being used. DVFS allows reducing power by selecting the optimal voltage supply, whereas task migration achieves this effect by balancing the workload among cores. This paper focuses on power‐aware scheduling allowing task migration to reduce energy consumption in multicore embedded systems implementing DVFS capabilities. To address energy savings, the devised schedulers follow two main rules: migrations are allowed at specific points of time and only one task is allowed to migrate each time. Two algorithms have been proposed working under real‐time constraints. The simpler algorithm, namely, single option migration (SOM) only checks just one target core before performing a migration. In contrast, the multiple option migration (MOM) searches the optimal target core. In general, the MOM algorithm achieves better energy savings than the SOM algorithm, although differences are wider for a reduced number of cores and frequency/voltage levels. Moreover, the MOM algorithm reduces energy consumption as much as 40% over the worst fit algorithm. Copyright

The Computer Journal | 2011

A New Energy-Aware Dynamic Task Set Partitioning Algorithm for Soft and Hard Embedded Real-Time Systems

José Luis March; Julio Sahuquillo; Houcine Hassan; Salvador Petit; José Duato

Power consumption is a major design concern in current embedded systems. To deal with consumption, many systems apply dynamic voltage scaling (DVS) techniques which dynamically change the system speed depending on the workload characteristics. DVS costs in a multicore system can be reduced by sharing the same DVS regulator among the cores. In this context, to handle energy efficiently, the workload must be properly balanced among the cores. This paper proposes a new heuristic algorithm to balance the workload in an embedded system with a coarse-grain multithreaded multicore processor. This heuristic is aimed at improving the overlapping time between the memory and the processor while keeping balanced core utilizations. To this end, the heuristic dynamically drives the frequency/voltage level to guarantee deadline fulfillment of the hard real-time tasks as well as to achieve a good trade-off between deadline losses and energy savings of the soft real-time tasks. The proposed technique has been evaluated on a model of a contemporary high-end ARM embedded microprocessor executing a set of standard embedded benchmarks. Energy savings depend on the range of frequency/voltage levels that the DVS regulator implements. Experimental results show that with the proposed heuristic, when working with hard real-time tasks, the energy consumption is about 33% the energy dissipated by a system without DVS regulator and balancing heuristic. Moreover, when soft real-time tasks are also considered, the normalized consumption presents values ranging in between 8 and 70% depending on the scheduler aggressiveness.

IEEE Transactions on Parallel and Distributed Systems | 2014

Cache-Hierarchy Contention-Aware Scheduling in CMPs

Josue Feliu; Salvador Petit; Julio Sahuquillo; José Duato

To improve chip multiprocessor (CMP) performance, recent research has focused on scheduling strategies to mitigate main memory bandwidth contention. Nowadays, commercial CMPs implement multilevel cache hierarchies that are shared by several multithreaded cores. In this microprocessor design, contention points may appear along the whole memory hierarchy. Moreover, this problem is expected to aggravate in future technologies, since the number of cores and hardware threads, and consequently the size of the shared caches increase with each microprocessor generation. This paper characterizes the impact on performance of the different contention points that appear along the memory subsystem. The analysis shows that some benchmarks are more sensitive to contention in higher levels of the memory hierarchy (e.g., shared L2) than to main memory contention. In this paper, we propose two generic scheduling strategies for CMPs. The first strategy takes into account the available bandwidth at each level of the cache hierarchy. The strategy selects the processes to be coscheduled and allocates them to cores to minimize contention effects. The second strategy also considers the performance degradation each process suffers due to contention-aware scheduling. Both proposals have been implemented and evaluated in a commercial single-threaded quad-core processor with a relatively small two-level cache hierarchy. The proposals reach, on average, a performance improvement by 5.38 and 6.64 percent when compared with the Linux scheduler, while this improvement is by 3.61 percent for an state-of-the-art memory contention-aware scheduler under the evaluated mixes.

international conference on parallel architectures and compilation techniques | 2013

L1-bandwidth aware thread allocation in multicore SMT processors

Josue Feliu; Julio Sahuquillo; Salvador Petit; José Duato

Improving the utilization of shared resources is a key issue to increase performance in SMT processors. Recent work has focused on resource sharing policies to enhance the processor performance, but their proposals mainly concentrate on novel hardware mechanisms that adapt to the dynamic resource requirements of the running threads. This work addresses the L1 cache bandwidth problem in SMT processors experimentally on real hardware. Unlike previous work, this paper concentrates on thread allocation, by selecting the proper pair of co-runners to be launched to the same core. The relation between L1 bandwidth requirements of each benchmark and its performance (IPC) is analyzed. We found that for individual benchmarks, performance is strongly connected to L1 bandwidth consumption, and this observation remains valid when several co-runners are launched to the same SMT core. Based on these findings we propose two L1 bandwidth aware thread to core (t2c) allocation policies, namely Static and Dynamic t2c allocation, respectively. The aim of these policies is to properly balance L1 bandwidth requirements of the running threads among the processor cores. Experiments on a Xeon E5645 processor show that the proposed policies significantly improve the performance of the Linux OS kernel regardless the number of cores considered.

international conference on intelligent pervasive computing | 2007

Leakage Current Reduction in Data Caches on Embedded Systems

Rafael Ubal; Julio Sahuquillo; Salvador Petit; Houcine Hassan; Pedro López

Nowadays, embedded systems can be found in a wide range of pervasive devices (e.g., smart phones, PDAs, or video/digital cameras). These devices contain large cache memories, whose power consumption can reach about 50% of the total spent energy, from which leakage energy is the predominant fraction in current technologies. This paper proposes a technique to reduce leakage energy consumption in data caches on embedded systems, which is based on the fact that most stored bits take a logical value of zero. The proposed technique has been evaluated on a model of a contemporary high-end embedded microprocessor, namely the ARM Cortex A8 processor, executing a set of standard embedded benchmarks. Experimental results show that leakage energy savings reach about 40% with no IPC loss.

IEEE Transactions on Computers | 2015

Design of Hybrid Second-Level Caches

Alejandro Valero; Julio Sahuquillo; Salvador Petit; Pedro López; José Duato

In recent years, embedded dynamic random-access memory (eDRAM) technology has been implemented in last-level caches due to its low leakage energy consumption and high density. However, the fact that eDRAM presents slower access time than static RAM (SRAM) technology has prevented its inclusion in higher levels of the cache hierarchy. This paper proposes to mingle SRAM and eDRAM banks within the data array of second-level (L2) caches. The main goal is to achieve the best trade-off among performance, energy, and area. To this end, two main directions have been followed. First, this paper explores the optimal percentage of banks for each technology. Second, the cache controller is redesigned to deal with performance and energy. Performance is addressed by keeping the most likely accessed blocks in fast SRAM banks. In addition, energy savings are further enhanced by avoiding unnecessary destructive reads of eDRAM blocks. Experimental results show that, compared to a conventional SRAM L2 cache, a hybrid approach requiring similar or even lower area speedups the performance on average by 5.9 percent, while the total energy savings are by 32 percent. For a 45 nm technology node, the energy-delay-area product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless of the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is an eighth of the total number of cache banks.

Explore More