Edwin V. Bonilla | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Edwin V. Bonilla is active.

Explore More

Publication

Featured researches published by Edwin V. Bonilla.

symposium on code generation and optimization | 2006

Using Machine Learning to Focus Iterative Optimization

Felix Agakov; Edwin V. Bonilla; John Cavazos; Björn Franke; Grigori Fursin; Michael F. P. O'Boyle; John Thomson; Marc Toussaint; Christopher K. I. Williams

Iterative compiler optimization has been shown to outperform static approaches. This, however, is at the cost of large numbers of evaluations of the program. This paper develops a new methodology to reduce this number and hence speed up iterative optimization. It uses predictive modelling from the domain of machine learning to automatically focus search on those areas likely to give greatest performance. This approach is independent of search algorithm, search space or compiler infrastructure and scales gracefully with the compiler optimization space size. Off-line, a training set of programs is iteratively evaluated and the shape of the spaces and program features are modelled. These models are learnt and used to focus the iterative optimization of a new program. We evaluate two learnt models, an independent and Markov model, and evaluate their worth on two embedded platforms, the Texas Instrument C67I3 and the AMD Au1500. We show that such learnt models can speed up iterative search on large spaces by an order of magnitude. This translates into an average speedup of 1.22 on the TI C6713 and 1.27 on the AMD Au1500 in just 2 evaluations.

symposium on code generation and optimization | 2007

Rapidly Selecting Good Compiler Optimizations using Performance Counters

John Cavazos; Grigori Fursin; Felix Agakov; Edwin V. Bonilla; Michael F. P. O'Boyle; Olivier Temam

Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is non-trivial. There have been several proposed techniques that search the space of compiler options to find good solutions; however such approaches can be expensive. This paper proposes a different approach using performance counters as a means of determining good compiler optimization settings. This is achieved by learning a model off-line which can then be used to determine good settings for any new program. We show that such an approach outperforms the state-of-the-art and is two orders of magnitude faster on average. Furthermore, we show that our performance counter-based approach outperforms techniques based on static code features. Using our technique we achieve a 17% improvement over the highest optimization setting of the commercial PathScale EKOPath 2.3.1 optimizing compiler on the SPEC benchmark suite on a AMD Athlon 64 3700+ platform

international symposium on microarchitecture | 2009

Portable compiler optimisation across embedded programs and microarchitectures using machine learning

Christophe Dubach; Timothy M. Jones; Edwin V. Bonilla; Grigori Fursin; Michael F. P. O'Boyle

Building an optimising compiler is a difficult and time consuming task which must be repeated for each generation of a microprocessor. As the underlying microarchitecture changes from one generation to the next, the compiler must be retuned to optimise specifically for that new system. It may take several releases of the compiler to effectively exploit a processors performance potential, by which time a new generation has appeared and the process starts again. We address this challenge by developing a portable optimising compiler. Our approach employs machine learning to automatically learn the best optimisations to apply for any new program on a new microarchitectural configuration. It achieves this by learning a model off-line which maps a microarchitecture description plus the hardware counters from a single run of the program to the best compiler optimisation passes. Our compiler gains 67% of the maximum speedup obtainable by an iterative compiler search using 1000 evaluations. We obtain, on average, a 1.16x speedup over the highest default optimisation level across an entire microarchitecture configuration space, achieving a 4.3x speedup in the best case. We demonstrate the robustness of this technique by applying it to an extended microarchitectural space where we achieve comparable performance.

International Journal of Parallel Programming | 2011

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Grigori Fursin; Yuriy Kashnikov; Abdul Wahid Memon; Zbigniew Chamski; Olivier Temam; Mircea Namolaru; Elad Yom-Tov; Bilha Mendelson; Ayal Zaks; Eric Courtois; François Bodin; Phil Barnard; Elton Ashton; Edwin V. Bonilla; John Thomson; Christopher K. I. Williams; Michael O’Boyle

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor.

compilers, architecture, and synthesis for embedded systems | 2006

Automatic performance model construction for the fast software exploration of new hardware designs

John Cavazos; Christophe Dubach; Felix Agakov; Edwin V. Bonilla; Michael F. P. O'Boyle; Grigori Fursin; Olivier Temam

Developing an optimizing compiler for a newly proposed architecture is extremely difficult when there is only a simulator of the machine available. Designing such a compiler requires running many experiments in order to understand how different optimizations interact. Given that simulators are orders of magnitude slower than real processors, such experiments are highly restricted. This paper develops a technique to automatically build a performance model for predicting the impact of program transformations on any architecture, based on a limited number of automatically selected runs. As a result, the time for evaluating the impact of any compiler optimization in early design stages can be drastically reduced such that all selected potential compiler optimizations can be evaluated. This is achieved by first evaluating a small set of sample compiler optimizations on a prior set of benchmarks in order to train a model, followed by a very small number of evaluations, or probes, of the target program.We show that by training on less than 0. 7% of all possible transformations (640 samples collected from 10 benchmarks out of 880000 possible samples, 88000 per training benchmark) and probing the new program on only 4 transformations, we can predict the performance of all program transformations with an error of just 7. 3% on average. As each prediction takes almost no time to generate, this scheme provides an accurate method of evaluating compiler performance, which is several orders of magnitude faster than current approaches.

design, automation, and test in europe | 2012

Predicting best design trade-offs: a case study in processor customization

Marcela Zuluaga; Edwin V. Bonilla; Nigel P. Topham

Given the high level description of a task, many different hardware modules may be generated while meeting its behavioral requirements. The characteristics of the generated hardware can be tailored to favor energy efficiency, performance, accuracy or die area. The inherent trade-offs between such metrics need to be explored in order to choose a solution that meets design and cost expectations. We address the generic problem of automatically deriving a hardware implementation from a high-level task description. In this paper we present a novel technique that exploits previously explored implementation design spaces in order to find optimal trade-offs for new high-level descriptions. This technique is generalizable to a range of high-level synthesis problems in which trade-offs can be exposed by changing the parameters of the hardware generation tool. Our strategy, based upon machine learning techniques, models the impact of the parameterization of the tool on the target objectives, given the characteristics of the input. Thus, a predictor is able to suggest a subset of parameters that are likely to lead to optimal hardware implementations. The proposed method is evaluated on a resource sharing problem which is typical in high level synthesis, where the trade-offs between area and performance need to be explored. In this case study, we show that the technique can reduce by two orders of magnitude the number of design points that need to be explored in order to find the Pareto optimal solutions.

international world wide web conferences | 2012

New objective functions for social collaborative filtering

Joseph Noel; Scott Sanner; Khoi-Nguyen Tran; Peter Christen; Lexing Xie; Edwin V. Bonilla; Ehsan Abbasnejad; Nicolás Della Penna

This paper examines the problem of social collaborative filtering (CF) to recommend items of interest to users in a social network setting. Unlike standard CF algorithms using relatively simple user and item features, recommendation in social networks poses the more complex problem of learning user preferences from a rich and complex set of user profile and interaction information. Many existing social CF methods have extended traditional CF matrix factorization, but have overlooked important aspects germane to the social setting. We propose a unified framework for social CF matrix factorization by introducing novel objective functions for training. Our new objective functions have three key features that address main drawbacks of existing approaches: (a) we fully exploit feature-based user similarity, (b) we permit direct learning of user-to-user information diffusion, and (c) we leverage co-preference (dis)agreement between two users to learn restricted areas of common interest. We evaluate these new social CF objectives, comparing them to each other and to a variety of (social) CF baselines, and analyze user behavior on live user trials in a custom-developed Facebook App involving data collected over five months from over 100 App users and their 37,000+ friends.

international symposium on microarchitecture | 2010

A Predictive Model for Dynamic Microarchitectural Adaptivity Control

Christophe Dubach; Timothy M. Jones; Edwin V. Bonilla; Michael F. P. O'Boyle

Adaptive micro architectures are a promising solution for designing high-performance, power-efficient microprocessors. They offer the ability to tailor computational resources to the specific requirements of different programs or program phases. They have the potential to adapt the hardware cost-effectively at runtime to any applications needs. However, one of the key challenges is how to dynamically determine the best architecture configuration at any given time, for any new workload. This paper proposes a novel control mechanism based on a predictive model for micro architectural adaptivity control. This model is able to efficiently control adaptivity by monitoring the behaviour of an applications different phases at runtime. We show that using this model on SPEC 2000, we double the energy/performance efficiency of the processor when compared to the best static configuration tuned for the whole benchmark suite. This represents 74\% of the improvement available if we knew the best micro architecture for each program phase ahead of time. In addition, we show that the overheads associated with the implementation of our scheme have a negligible impact on performance and power.

ACM Transactions on Architecture and Code Optimization | 2014

Automatic feature generation for machine learning--based optimising compilation

Hugh Leather; Edwin V. Bonilla; Michael F. P. O'Boyle

Recent work has shown that machine learning can automate and in some cases outperform handcrafted compiler optimisations. Central to such an approach is that machine learning techniques typically rely upon summaries or features of the program. The quality of these features is critical to the accuracy of the resulting machine learned algorithm; no machine learning method will work well with poorly chosen features. However, due to the size and complexity of programs, theoretically there are an infinite number of potential features to choose from. The compiler writer now has to expend effort in choosing the best features from this space. This article develops a novel mechanism to automatically find those features that most improve the quality of the machine learned heuristic. The feature space is described by a grammar and is then searched with genetic programming and predictive modelling. We apply this technique to loop unrolling in GCC 4.3.1 and evaluate our approach on a Pentium 6. On a benchmark suite of 57 programs, GCCs hard-coded heuristic achieves only 3p of the maximum performance available, whereas a state-of-the-art machine learning approach with hand-coded features obtains 59p. Our feature generation technique is able to achieve 76p of the maximum available speedup, outperforming existing approaches.

ACM Transactions on Architecture and Code Optimization | 2013

Dynamic microarchitectural adaptation using machine learning

Christophe Dubach; Timothy M. Jones; Edwin V. Bonilla

Adaptive microarchitectures are a promising solution for designing high-performance, power-efficient microprocessors. They offer the ability to tailor computational resources to the specific requirements of different programs or program phases. They have the potential to adapt the hardware cost-effectively at runtime to any application’s needs. However, one of the key challenges is how to dynamically determine the best architecture configuration at any given time, for any new workload. This article proposes a novel control mechanism based on a predictive model for microarchitectural adaptivity control. This model is able to efficiently control adaptivity by monitoring the behaviour of an application’s different phases at runtime. We show that by using this model on SPEC 2000, we double the energy/performance efficiency of the processor when compared to the best static configuration tuned for the whole benchmark suite. This represents 74% of the improvement available if we know the best microarchitecture for each program phase ahead of time. In addition, we present an extended analysis of the best configurations found and show that the overheads associated with the implementation of our scheme have a negligible impact on performance and power.

Explore More