Is this you? Create Your Porfile

Miquel Moreto

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Miquel Moreto is active.

Explore More

Publication

Featured researches published by Miquel Moreto.

international symposium on microarchitecture | 2008

Multicore Resource Management

Kyle J. Nesbit; Miquel Moreto; Francisco J. Cazorla; Alex Ramirez; Mateo Valero; James E. Smith

Current resource management mechanisms and policies are inadequate for future multicore systems. Instead, a hardware/software interface based on the virtual private machine abstraction would allow software policies to explicitly manage microarchitecture resources. VPM policies, implemented primarily in software, translate application and system objectives into VPM resource assignments. Then, VPM mechanisms securely multiplex, arbitrate, or distribute hardware resources to satisfy the VPM assignments.

international symposium on computer architecture | 2013

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Henry Cook; Miquel Moreto; Sarah Bird; Khanh Dao; David A. Patterson; Krste Asanovic

Computing workloads often contain a mix of interactive, latency-sensitive foreground applications and recurring background computations. To guarantee responsiveness, interactive and batch applications are often run on disjoint sets of resources, but this incurs additional energy, power, and capital costs. In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground applications, while avoiding degradation in interactive responsiveness. We evaluate these tradeoffs using commercial x86 multicore hardware that supports cache partitioning, and find that real hardware measurements with full applications provide different observations than past simulation-based evaluations. Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance degradation of up to 34% with an average of 6%. With optimal static LLC partitioning, the average energy improvement increases to 12% and the average throughput improvement to 60%, while the worst case slowdown is reduced noticeably to 7% with an average slowdown of only 2%. We also evaluate a practical low-overhead dynamic algorithm to control partition sizes, and are able to realize the potential performance guarantees of the optimal static approach, while increasing background throughput by an additional 19%.

IEEE Transactions on Computers | 2008

Modeling Toroidal Networks with the Gaussian Integers

Carmen Martínez; Ramón Beivide; Esteban Stafford; Miquel Moreto; Ernst M. Gabidulin

In this paper we consider a broad family of toroidal networks, denoted as Gaussian networks, which include many previously proposed and used topologies. We will define such networks by means of the Gaussian integers, the subset of the complex numbers with integer real and imaginary parts. Nodes in Gaussian networks are labeled by Gaussian integers, which confer these topologies an algebraic structure based on quotient rings of the Gaussian integers. In this sense, Gaussian integers reveal themselves as the appropriate tool for analyzing and exploiting any type of toroidal network. Using this algebraic approach, we can characterize the main distance-related properties of Gaussian networks, providing closed expressions for their diameter and average distance. In addition, we solve some important applications, like unicast and broadcast packet routing or the perfect placement of resources over these networks.

Operating Systems Review | 2009

FlexDCP: a QoS framework for CMP architectures

Miquel Moreto; Francisco J. Cazorla; Alex Ramirez; Rizos Sakellariou; Mateo Valero

Current multicore architectures offer high throughput by increasing hardware resource utilization. As the number of cores in a multicore system increases, providing Quality of Service (QoS) to applications in addition to throughput is becoming an important problem. In this work, we present FlexDCP, a framework that allows the Operating System (OS) to guarantee a QoS for each application running in a chip multiprocessor. FlexDCP directly estimates the performance of applications for different cache configurations instead of using indirect measures of performance like the number of misses. This information allows the OS to convert QoS requirements into resource assignments. Consequently, it offers more flexibility to the OS as it can optimize different QoS metrics like per-application performance or global performance metrics such as fairness, weighted speed up or throughput. Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1:48% under the objective for remaining cases. When optimizing a global QoS metric like fairness, FlexDCP consistently outperforms traditional eviction policies like LRU, pseudo LRU and previous dynamic cache partitioning proposals for two-, four- and eightcore configurations. In an eight-core architecture FlexDCP obtains a fairness improvement of 10:1% over Fair, the best policy in the literature optimizing fairness.

design automation conference | 2013

Tessellation: refactoring the OS around explicit resource containers with continuous adaptation

Juan A. Colmenares; Gage Eads; Steven A. Hofmeyr; Sarah Bird; Miquel Moreto; David Chou; Brian Gluzman; Eric Roman; Davide B. Bartolini; Nitesh Mor; Krste Asanovic; John Kubiatowicz

Adaptive Resource-Centric Computing (ARCC) enables a simultaneous mix of high-throughput parallel, real-time, and interactive applications through automatic discovery of the correct mix of resource assignments necessary to achieve application requirements. This approach, embodied in the Tessellation manycore operating system, distributes resources to QoS domains called cells. Tessellation separates global decisions about the allocation of resources to cells from application-specific scheduling of resources within cells. We examine the implementation of ARCC in the Tessellation OS, highlight Tessellations ability to provide predictable performance, and investigate the performance of Tessellation services within cells.

IEEE Transactions on Parallel and Distributed Systems | 2010

Twisted Torus Topologies for Enhanced Interconnection Networks

José María Cámara; Miquel Moreto; Enrique Vallejo; Ramón Beivide; José Miguel-Alonso; Carmen Martínez; Javier Navaridas

Many current parallel computers are built around a torus interconnection network. Machines from Cray, HP, and IBM, among others, make use of this topology. In terms of topological advantages, square (2D) or cubic (3D) tori would be the topologies of choice. However, for different practical reasons, 2D and 3D tori with different number of nodes per dimension have been used. These mixed-radix topologies are not edge symmetric, which translates into poor performance due to an unbalanced use of network resources. In this work, we analyze twisted 2D and 3D mixed-radix tori that remove the network bottlenecks present in nontwisted ones. Such topologies recover edge symmetry, and consequently, balance the utilization of their links. The distance-related properties of twisted tori together with a full characterization of their bisection bandwidth are described in this paper. A simulation-based performance evaluation has been carried out to assess the network performance under synthetic and trace-driven workloads. The obtained results show noticeable and consistent performance gains (up to an increase of 74 percent in accepted load). In addition, we propose scalable and practicable packet routing mechanisms and wiring layouts for these interconnection systems. The complexity of the architectural proposals is similar to the one exhibited by routing and folding mechanisms in standard tori.

architectural support for programming languages and operating systems | 2012

Optimal task assignment in multithreaded processors: a statistical approach

Petar Radojković; Vladimir Cakarevic; Miquel Moreto; Javier Verdú; Alex Pajuelo; Francisco J. Cazorla; Mario Nemirovsky; Mateo Valero

The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system performance. However, finding an optimal task assignment for a workload running on MMT processors is an NP-complete problem. Due to the fact that the performance of the best possible task assignment is unknown, the room for improvement of current task-assignment algorithms cannot be determined. This is a major problem for the industry because it could lead to: (1)~A waste of resources if excessive effort is devoted to improving a task assignment algorithm that already provides a performance that is close to the optimal one, or (2)~significant performance loss if insufficient effort is devoted to improving poorly-performing task assignment algorithms. In this paper, we present a method based on Extreme Value Theory that allows the prediction of the performance of the optimal task assignment in MMT processors. We further show that executing a sample of several hundred or several thousand random task assignments is enough to obtain, with very high confidence, an assignment with a performance that is close to the optimal one. We validate our method with an industrial case study for a set of multithreaded network applications running on an UltraSPARC~T2 processor.

International Journal of Parallel Programming | 2006

Dense Gaussian networks: suitable topologies for on-chip multiprocessors

Carmen Martínez; Enrique Vallejo; Ramón Beivide; Cruz Izu; Miquel Moreto

This paper explores the suitability of dense circulant graphs of degree four for the design of on-chip interconnection networks. Networks based on these graphs reduce the Torus diameter in a factor

high performance embedded architectures and compilers | 2008