Vinicius Petrucci | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vinicius Petrucci is active.

Explore More

Publication

Featured researches published by Vinicius Petrucci.

architectural support for programming languages and operating systems | 2015

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Johann Hauswald; Michael A. Laurenzano; Yunqi Zhang; Cheng Li; Austin Rovinski; Arjun Khurana; Ronald G. Dreslinski; Trevor N. Mudge; Vinicius Petrucci; Lingjia Tang; Jason Mars

As user demand scales for intelligent personal assistants (IPAs) such as Apples Siri, Googles Google Now, and Microsofts Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10x and 16x. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6x and 1.4x, respectively.

energy efficient computing and networking | 2010

A dynamic optimization model for power and performance management of virtualized clusters

Vinicius Petrucci; Orlando Loques; Daniel Mossé

An increasing number of large-scale server clusters are being deployed in data centers for supporting many different web-based application services in a seamless fashion. In this scenario, the rising energy costs for keeping up those web clusters is becoming an important concern for many business. In this paper we present an optimization solution for power and performance management in a platform running multiple independent web applications. Our approach assumes a virtualized server environment and includes an optimization model and strategy to dynamically manage the cluster power consumption, while meeting the applications workload demands.

Expert Systems With Applications | 2013

An iterated local search heuristic for multi-capacity bin packing and machine reassignment problems

Renaud Masson; Thibaut Vidal; Julien Michallet; Puca Huachi Vaz Penna; Vinicius Petrucci; Anand Subramanian; Hugues Dubedout

This paper proposes an efficient Multi-Start Iterated Local Search for Packing Problems (MS-ILS-PPs) metaheuristic for Multi-Capacity Bin Packing Problems (MCBPP) and Machine Reassignment Problems (MRP). The MCBPP is a generalization of the classical bin-packing problem in which the machine (bin) capacity and task (item) sizes are given by multiple (resource) dimensions. The MRP is a challenging and novel optimization problem, aimed at maximizing the usage of available machines by reallocating tasks/processes among those machines in a cost-efficient manner, while fulfilling several capacity, conflict, and dependency-related constraints. The proposed MS-ILS-PP approach relies on simple neighborhoods as well as problem-tailored shaking procedures. We perform computational experiments on MRP benchmark instances containing between 100 and 50,000 processes. Near-optimum multi-resource allocation and scheduling solutions are obtained while meeting specified processing-time requirements (on the order of minutes). In particular, for 9/28 instances with more than 1000 processes, the gap between the solution value and a lower bound measure is smaller than 0.1%. Our optimization method is also applied to solve classical benchmark instances for the MCBPP, yielding the best known solutions and optimum ones in most cases. In addition, several upper bounds for non-solved problems were improved.

high-performance computer architecture | 2015

Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers

Vinicius Petrucci; Michael A. Laurenzano; John Doherty; Yunqi Zhang; Daniel Mossé; Jason Mars; Lingjia Tang

Heterogeneous multicore architectures have the potential to improve energy efficiency by integrating power-efficient wimpy cores with high-performing brawny cores. However, it is an open question as how to deliver energy reduction while ensuring the quality of service (QoS) of latency-sensitive web-services running on such heterogeneous multicores in warehouse-scale computers (WSCs). In this work, we first investigate the implications of heterogeneous multicores in WSCs and show that directly adopting heterogeneous multicores without re-designing the software stack to provide QoS management leads to significant QoS violations. We then present Octopus-Man, a novel QoS-aware task management solution that dynamically maps latency-sensitive tasks to the least power-hungry processing resources that are sufficient to meet the QoS requirements. Using carefully-designed feedback-control mechanisms, Octopus-Man addresses critical challenges that emerge due to uncertainties in workload fluctuations and adaptation dynamics in a real system. Our evaluation using web-search and memcached running on a real-system Intel heterogeneous prototype demonstrates that Octopus-Man improves energy efficiency by up to 41% (CPU power) and up to 15% (system power) over an all-brawny WSC design while adhering to specified QoS targets.

acm symposium on applied computing | 2009

A framework for dynamic adaptation of power-aware server clusters

Vinicius Petrucci; Orlando Loques; Daniel Mossé

This paper presents a framework to support dynamic adaptation of applications, which consists of a reusable infrastructure with standard elements to monitor and adapt running applications, and a contract-based adaptation language to enable one to express high-level adaptation policies. The proposed framework is used to introduce dynamic adaptation capabilities into a server cluster infrastructure, intended to address power and performance management concerns. By experimental evaluation, we demonstrate that our approach is useful and effective in providing the required support for describing and deploying typical power management contracts.

acm symposium on applied computing | 2010

Dynamic optimization of power and performance for virtualized server clusters

Vinicius Petrucci; Orlando Loques; Daniel Mossé

In this paper we present an optimization solution for power and performance management in a platform running multiple independent applications. Our approach assumes a virtualized server environment and includes an optimization model and strategy to dynamically control the cluster power consumption, while meeting the applications workload demands.

ACM Transactions in Embedded Computing Systems | 2015

Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems

Vinicius Petrucci; Orlando Loques; Daniel Mossé; Rami G. Melhem; Neven M. Abou Gazala; Sameh Gobriel

The current trend to move from homogeneous to heterogeneous multicore systems provides compelling opportunities for achieving performance and energy efficiency goals. Running multiple threads in multicore systems poses challenges on meeting limited shared resources, such as memory bandwidth. We propose an optimization approach that includes an Integer Linear Programming (ILP) optimization model and a scheme to dynamically determine thread-to-core assignment. We present simulation analysis that shows energy savings and performance gains for a variety of workloads compared to state-of-the-art schemes. We implemented and evaluated a prototype of our thread assignment approach at user level, leveraging Linux scheduling and performance-monitoring capabilities.

Computers & Industrial Engineering | 2012

A column generation approach for power-aware optimization of virtualized heterogeneous server clusters

Hugo Harry Kramer; Vinicius Petrucci; Anand Subramanian; Eduardo Uchoa

Increasingly, clusters of servers have been deployed in large data centers to support the development and implementation of many kinds of services, having distinct workload demands that vary over time, in a scalable and efficient computing environment. Emerging trends are utility/cloud computing platforms, where many network services, implemented and supported using server virtualization techniques, are hosted on a shared cluster infrastructure of physical servers. The energy consumed to maintain these large server clusters became a very important concern, which in turn, requires major investigation of optimization techniques to improve the energy efficiency of their computing infrastructure. In this work, we propose an efficient approach to solve a relevant cluster optimization problem which, in practice, can be used as an embedded module to implement an integrated power and performance management solution in a real server cluster. The optimization approach simultaneously deals with (i) CPU power-saving techniques combined with server switching on/off mechanisms, (ii) the case of server heterogeneity, (iii) virtualized server environments, (iv) an efficient optimization method, which is based on column generation techniques. The key aspects of our approach are the basis on rigorous and robust optimization techniques, given by high quality solutions in short amount of processing time, and experimental results on the cluster configuration problem for large-scale heterogeneous server clusters that can make use of virtualization techniques.

embedded software | 2013

Energy-aware thread co-location in heterogeneous multicore processors

Rajiv Nishtala; Daniel Mossé; Vinicius Petrucci

Given the wide variety of performance demands for various workloads, the trend in embedded systems is shifting from homogeneous to heterogeneous processors, which have been shown to yield performance and energy saving benefits. A typical heterogeneous processor has cores with different performance and power characteristics, that is, high performance and power hungry (“big”) cores, and low power and performance (“small”) cores. In order to satisfy the memory bandwidth and computation demands of various threads, it is important (albeit challenging) to map threads to cores. Such assignment should take into account that threads could potentially be harmful to each other in the usage of shared resources (e.g., cache, memory). We propose a scheme for dynamic energy-efficient assignment of threads to big/small cores, DIO-E (Distributed Intensity Online-Energy), which is an enhancement of the previously proposed DIO. In contrast to DIO, we take into account both CPU and memory demands of threads to characterize the performance of threads when co-running on the same core at run-time. Our results show that DIO-E improves the energy-delay-squared product (ED2) by 9% (average) over DIO, running on a performance-asymmetric multicore system. Both DIO and DIO-E show about 50% improvement in ED2 over a state-of-the-art solution.

high-performance computer architecture | 2017

Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads

Rajiv Nishtala; Paul M. Carpenter; Vinicius Petrucci; Xavier Martorell

In 2013, U. S. data centers accounted for 2.2% of the countrys total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important workloads are interactive, and they demand strict levels of quality-of-service (QoS) to meet user expectations, making it challenging to reduce power consumption due to increasing performance demands. This paper introduces Hipster, a technique that combines heuristics and reinforcement learning to manage latency-critical workloads. Hipsters goal is to improve resource efficiency in data centers while respecting the QoS of the latency-critical workloads. Hipster achieves its goal by exploring heterogeneous multi-cores and dynamic voltage and frequency scaling (DVFS). To improve data center utilization and make best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the latency-critical workloads. We perform experiments using a 64-bit ARM big.LITTLE platform, and show that, compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18%.

Explore More