Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Leonardo Piga is active.

Publication


Featured researches published by Leonardo Piga.


ieee international symposium on workload characterization | 2015

A Taxonomy of GPGPU Performance Scaling

Abhinandan Majumdar; Gene Y. Wu; Kapil Dev; Joseph L. Greathouse; Indrani Paul; Wei Huang; Arjun-Karthik Venugopal; Leonardo Piga; Chip Freitag; Sooraj Puthoor

Graphics processing units (GPUs) range from small, embedded designs to large, high-powered discrete cards. While the performance of graphics workloads is generally understood, there has been little study of the performance of GPGPU applications across a variety of hardware configurations. This work presents performance scaling data gathered for 267 GPGPU kernels from 97 programs run on 891 hardware configurations of a modern GPU. We study the performance of these kernels across a 5× change in core frequency, 8.3× change in memory bandwidth, and 11× difference in compute units. We illustrate that many kernels scale in intuitive ways, such as those that scale directly with added computational capabilities or memory bandwidth. We also find a number of kernels that scale in non-obvious ways, such as losing performance when more processing units are added or plateauing as frequency and bandwidth are increased. In addition, we show that a number of current benchmark suites do not scale to modern GPU sizes, implying that either new benchmarks or new inputs are warranted.


ieee international symposium on workload characterization | 2011

Empirical Web server power modeling and characterization

Leonardo Piga; Reinaldo A. Bergamaschi; Felipe Klein; Rodolfo Azevedo; Sandro Rigo

Commodity processors, which are prevalent in Internet-based data centers, do not have internal sensors for monitoring energy consumption. Such processors usually feature performance counters which can be used to indirectly estimate power consumption [1]. The usual approach in those studies is to derive linear power models based on the usage numbers collected for the processor sub-components such as caches and branch predictor. These models are usually targeted to CPU-bound applications which need more CPU performance counter parameters and display high CPU usage most of time. On a Web server environment, the applications are mostly I/O-bound which creates non-linear effects among server statistics of performance and power, making these models less suitable for Web servers. This paper presents a new approach for power models for Web servers, based on ranges of CPU usage values and performance server statistics. This new method softens non-linear relationship between server statistics and power consumption on linear power models improving their accuracy.


southern conference programmable logic | 2009

Comparing RTL and high-level synthesis methodologies in the design of a theora video decoder IP core

Leonardo Piga; Sandro Rigo

An important share of the consumer electronics market is focused on devices capable of running multimedia applications, like audio and video decoders. In order to achieve the performance level demanded by these applications, it is important to develop specialized hardware IPs in order to cope with the most computational intensive parts. Nowadays, designers are facing the challenge of integrating several components, including processor, memory, and specialized IP cores, into a single chip, giving raise to the so called Systems-on-chip (SoC). The high complexity of such systems and the strict time-to-market in the electronics industry motivated the introduction of new design methodologies during the last years. This work presents a comparison between two hardware development methodologies in order to design a Theora video decoder IP core from algorithm down to FPGA.We first implemented it in hand-written RTL code using VHDL, resulting in a 56% time reduction in the decoding process when compared to a software library. The second methodology implements the same hardware using SystemC and behavioral synthesis. The second IP core was developed in 70% less time with satisfactory results. We compare the two approaches in terms of area and latency.


symposium on computer architecture and high performance computing | 2012

Cloud Workload Analysis with SWAT

Mauricio Breternitz; Keith Lowery; Anton Charnoff; Patryk Kaminski; Leonardo Piga

This note describes the Synthetic Workload Application Toolkit (SWAT) and presents the results from a set of experiments on some key cloud workloads. SWAT is a software platform that automates the creation, deployment, provisioning, execution, and (most importantly) data gathering of synthetic compute workloads on clusters of arbitrary size. SWAT collects and aggregates data from application execution logs, operating system call interfaces, and micro architecture-specific program counters. The data collected by SWAT are used to characterize the effects of network traffic, file I/O, and computation on program performance. The output is analyzed to provide insight into the design and deployment of cloud workloads and systems. Each workload is characterized according to its scalability with the number of server nodes and Hadoop server jobs, sensitivity to network characteristics (bandwidth, latency, statistics on packet size), and computation vs. I/O intensity as these values adjusted via workload-specific parameters. (In the future, we will use SWATs benchmark synthesizer capability.) We also characterize micro-architectural characteristics that give insight on the micro architecture of processors better suited for this class of workloads. We contrast our results with prior work on Cloud Suite [5], validating some conclusions and providing further insight into others. This illustrates SWATs data collection capabilities and usefulness to obtain insight on cloud applications and systems.


international conference on cluster computing | 2016

A Case for Criticality Models in Exascale Systems

Brian Kocoloski; Leonardo Piga; Wei Huang; Indrani Paul; John R. Lange

Performance variation is a significant problem for large scale HPC systems and will increase on future exascale systems. In this work, we show that performance variation impacts the performance and energy efficiency of contemporary large-scale computing systems in highly temporally inconsistent ways. We thus present a case for criticality models, a learning based mechanism that allows a system to generate holistic models of performance variation as it occurs during application runtime. Criticality models are designed to provide a mechanism by which applications can detect performance variation at runtime and take action to mitigate its effects. We present a promising preliminary analysis of criticality models on a small scale cluster. Our results demonstrate that models based on logistic regression scan accurately model criticality at this scale.


high-performance computer architecture | 2017

Dynamic GPGPU Power Management Using Adaptive Model Predictive Control

Abhinandan Majumdar; Leonardo Piga; Indrani Paul; Joseph L. Greathouse; Wei Huang; David H. Albonesi

Modern processors can greatly increase energy efficiency through techniques such as dynamic voltage and frequency scaling. Traditional predictive schemes are limited in their effectiveness by their inability to plan for the performance and energy characteristics of upcoming phases. To date, there has been little research exploring more proactive techniques that account for expected future behavior when making decisions. This paper proposes using Model Predictive Control (MPC) to attempt to maximize the energy efficiency of GPU kernels without compromising performance. We develop performance and power prediction models for a recent CPU-GPU heterogeneous processor. Our system then dynamically adjusts hardware states based on recent execution history, the pattern of upcoming kernels, and the predicted behavior of those kernels. We also dynamically trade off the performance overhead and the effectiveness of MPC in finding the best configuration by adapting the horizon length at runtime. Our MPC technique limits performance loss by proactively spending energy on the kernel iterations that will gain the most performance from that energy. This energy can then be recovered in future iterations that are less performance sensitive. Our scheme also avoids wasting energy on low-throughput phases when it foresees future high-throughput kernels that could better use that energy. Compared to state-of-the-practice schemes, our approach achieves 24.8% energy savings with a performance loss (including MPC overheads) of 1.8%. Compared to state-of-the-art history-based schemes, our approach achieves 6.6% chip-wide energy savings while simultaneously improving performance by 9.6%.


international conference on performance engineering | 2013

Assessing computer performance with stocs

Leonardo Piga; Gabriel Ferreira Teles Gomes; Rafael Auler; Bruno Rosa; Sandro Rigo; Edson Borin

Several aspects of a computer system cause performance measurements to include random errors. Moreover, these systems are typically composed of a non-trivial combination of individual components that may cause one system to perform better or worse than another depending on the workload. Hence, properly measuring and comparing computer systems performance are non-trivial tasks. The majority of work published on recent major computer architecture conferences do not report the random errors measured on their experiments. The few remaining authors have been using only confidence intervals or standard deviations to quantify and factor out random errors. Recent publications claim that this approach could still lead to misleading conclusions. In this work, we reproduce and discuss the results obtained in previous study. Finally, we propose SToCS, a tool that integrates several statistical frameworks and facilitates the analysis of computer science experiments.


Sustainable Computing: Informatics and Systems | 2012

Data center power and performance optimization through global selection of P-states and utilization rates

Reinaldo A. Bergamaschi; Leonardo Piga; Sandro Rigo; Rodolfo Azevedo; Guido Araujo


Cluster Computing | 2014

Empirical and analytical approaches for web server power modeling

Leonardo Piga; Reinaldo A. Bergamaschi; Sandro Rigo


The Journal of Supercomputing | 2014

Adaptive global power optimization for Web servers

Leonardo Piga; Reinaldo A. Bergamaschi; Mauricio Breternitz; Sandro Rigo

Collaboration


Dive into the Leonardo Piga's collaboration.

Top Co-Authors

Avatar

Sandro Rigo

State University of Campinas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wei Huang

Advanced Micro Devices

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rodolfo Azevedo

State University of Campinas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge