Lauro Beltrão Costa
University of British Columbia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lauro Beltrão Costa.
Journal of Grid Computing | 2006
Walfredo Cirne; Francisco Vilar Brasileiro; Nazareno Andrade; Lauro Beltrão Costa; Alisson Andrade; Reynaldo Novaes; Miranda Mowbray
AbstracteScience is rapidly changing the way we do research. As a result, many research labs now need non-trivial computational power. Grid and voluntary computing are well-established solutions for this need. However, not all labs can effectively benefit from these technologies. In particular, small and medium research labs (which are the majority of the labs in the world) have a hard time using these technologies as they demand high visibility projects and/or high-qualified computer personnel. This paper describes OurGrid, a system designed to fill this gap. OurGrid is an open, free-to-join, cooperative Grid in which labs donate their idle computational resources in exchange for accessing other labs’ idle resources when needed. It relies on an incentive mechanism that makes it in the best interest of participants to collaborate with the system, employs a novel application scheduling technique that demands very little information, and uses virtual machines to isolate applications and thus provide security. The vision is that OurGrid enables labs to combine their resources in a massive worldwide computing platform. OurGrid is in production since December 2004. Any lab can join it by downloading its software from http://www.ourgrid.org.
international conference on parallel processing | 2003
Walfredo Cirne; Daniel Paranhos; Lauro Beltrão Costa; Elizeu Santos-Neto; Francisco Vilar Brasileiro; Jacques Philippe Sauvé; F.A.B. Silva; C.O. Barros; C. Silveira
We here discuss how to run Bag-of-Tasks applications on computational grids. Bag-of-Tasks applications (those parallel applications whose tasks are independent) are both relevant and amendable for execution on grids. However, few users currently execute their Bag-of-Tasks applications on grids. We investigate the reason for this state of affairs and introduce MyGrid, a system designed to overcome the identified difficulties. MyGrid provides a simple, complete and secure way for a user to run Bag-of-Tasks applications on all resources she has access to. Besides putting together a complete solution useful for real users, MyGrid embeds two important research contributions to grid computing. First, we introduce some simple working environment abstractions that hide machine configuration heterogeneity from the user. Second, we introduce work queue with replication (WQR), a scheduling heuristics that attains good performance without relying on information about the grid or the application, although consuming a few more cycles. Note that not depending on information makes WQR much easier to deploy in practice
cluster computing and the grid | 2012
Emalayan Vairavanathan; Samer Al-Kiswany; Lauro Beltrão Costa; Zhao Zhang; Daniel S. Katz; Michael Wilde; Matei Ripeanu
This paper evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing storage systems unable to harness all optimization opportunities as this often requires conflicting optimization options or even conflicting design decision at the level of the storage system. Second, when scheduling, workflow runtime engines make suboptimal decisions as they lack detailed data location information. This paper discusses the feasibility, and evaluates the potential performance benefits brought by, building a workflow-aware storage system that supports per-file access optimizations and exposes data location. To this end, this paper presents approaches to determine the application-specific data access patterns, and evaluates experimentally the performance gains of a workflow-aware storage approach. Our evaluation using synthetic benchmarks shows that a workflow-aware storage system can bring significant performance gains: up to 7× performance gain compared to the distributed storage system - MosaStore and up to 16× compared to a central, well provisioned, NFS server.
Future Generation Computer Systems | 2008
César A. F. De Rose; Tiago C. Ferreto; Rodrigo N. Calheiros; Walfredo Cirne; Lauro Beltrão Costa; Daniel Fireman
As the adoption of grid computing in organizations expands, the need for wise utilization of different types of resource also increases. A volatile resource, such as a desktop computer, is a common type of resource found in grids. However, using efficiently other types of resource, such as space-shared resources, represented by parallel supercomputers and clusters of workstations, is extremely important, since they can provide a great amount of computation power. Using space-shared resources in grids is not straightforward since they require jobs a priori to specify some parameters, such as allocation time and amount of processors. Current solutions (e.g. Grid Resource and Allocation Management (GRAM)) are based on the explicit definition of these parameters by the user. On the other hand, good progress has been made in supporting Bag-of-Tasks (BoT) applications on grids. This is a restricted model of parallelism on which tasks do not communicate among themselves, making recovering from failures a simple matter of reexecuting tasks. As such, there is no need to specify a maximum number of resources, or a period of time that resources must be executing the application, such as required by space-shared resources. Besides, this state of affairs makes leverage from space-shared resources hard for BoT applications running on grid. This paper presents an Explicit Allocation Strategy, in which an adaptor automatically fits grid requests to the resource in order to decrease the turn-around time of the application. We compare it with another strategy described in our previous work, called Transparent Allocation Strategy, in which idle nodes of the space-shared resource are donated to the grid. As we shall see, both strategies provide good results. Moreover, they are complementary in the sense that they fulfill different usage roles. The Transparent Allocation Strategy enables a resource owner to raise its utilization by offering cycles that would otherwise go wasted, while protecting the local workload from increased contention. The Explicit Allocation Strategy, conversely, allows a user to benefit from the accesses she has to space-shared resources in the grid, enabling her natively to submit tasks without having to craft (time, processors) requests.
grid computing | 2015
Lauro Beltrão Costa; Hao Yang; Emalayan Vairavanathan; Abmar Barros; Ketan Maheshwari; Gilles Fedak; Daniel S. Katz; Michael Wilde; Matei Ripeanu; Samer Al-Kiswany
This article evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing generic storage systems unable to harness all optimization opportunities as this often requires enabling conflicting optimizations or even conflicting design decisions at the storage system level. Second, most workflow runtime engines make suboptimal scheduling decisions as they lack the detailed data location information that is generally hidden by the storage system. This paper presents a limit study that evaluates the potential gains from building a workflow-aware storage system that supports per-file access optimizations and exposes data location. Our evaluation using synthetic benchmarks and real applications shows that a workflow-aware storage system can bring significant performance gains: up to 3x performance gains compared to a vanilla distributed storage system deployed on the same resources yet unaware of the possible file-level optimizations.
grid computing | 2010
Lauro Beltrão Costa; Matei Ripeanu
Versatile storage systems aim to maximize storage resource utilization by supporting the ability to ‘morph’ the storage system to best match the applications demands. To this end, versatile storage systems significantly extend the deployment- or run-time configurability of the storage system. This flexibility, however, introduces a new problem: a much larger, and potentially dynamic, configuration space makes manually configuring the storage system an undesirable if not unfeasible task. This paper presents our initial progress towards answering the question: “How can we configure a distributed storage system (i.e., enable/disable its various optimizations and configure their parameters) with minimal human intervention?” We discuss why manually configuring the storage system is undesirable; present the success criteria for an automated configuration solution; propose a generic architecture that supports automated configuration; and, finally, instantiate this architecture into a first prototype, which controls the configuration of similarity detection optimizations in the MosaStore distributed storage system. Our evaluation results demonstrate that the prototype can provide performance close to the optimal configuration at the cost of minimal overhead.
irregular applications: architectures and algorithms | 2013
Abdullah Gharaibeh; Elizeu Santos-Neto; Lauro Beltrão Costa; Matei Ripeanu
This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to processing units, such that the algorithmic tasks exercise each of the units where they perform best. However, the GPUs have much higher Thermal Design Power (TDP), thus their impact on the overall energy consumption is unclear. Our evaluation using large real-world graphs and synthetic graphs as large as 1 billion vertices and 16 billion edges shows that a hybrid system is efficient in terms of both time-to-solution and energy.
2011 International Green Computing Conference and Workshops | 2011
Lauro Beltrão Costa; Samer Al-Kiswany; Raquel Vigolvino Lopes; Matei Ripeanu
The energy costs of running computer systems are a growing concern: for large data centers, recent estimates put these costs higher than the cost of hardware itself. As a consequence, energy efficiency has become a pervasive theme for designing, deploying, and operating computer systems. This paper evaluates the energy trade-offs brought by data deduplication in distributed storage systems. Depending on the workload, deduplication can enable a lower storage footprint, reduce the I/O pressure on the storage system, and reduce network traffic, at the cost of increased computational overhead. From an energy perspective, data deduplication enables a trade-off between the energy consumed for additional computation and the energy saved by lower storage and network load. The main point our experiments and model bring home is the following: while for non energy-proportional machines performance- and energy-centric optimizations have break-even points that are relatively close, for the newer generation of energy proportional machines the break-even points are significantly different. An important consequence of this difference is that, with newer systems, there are higher energy inefficiencies when the system is optimized for performance.
international performance computing and communications conference | 2009
Lauro Beltrão Costa; Samer Al-Kiswany; Matei Ripeanu
This paper explores the ability to use Graphics Processing Units (GPUs) as co-processors to harness the inherent parallelism of batch operations in systems that require high performance. To this end we have chosen Bloom filters (space-efficient data structures that support the probabilistic representation of set membership) as the queries these data structures support are often performed in batches. Bloom filters exhibit low computational cost per amount of data, providing a baseline for more complex batch operations. We implemented BloomGPU a library that supports offloading Bloom filter support to the GPU and evaluate this library under realistic usage scenarios. By completely offloading Bloom filter operations to the GPU, BloomGPU outperforms an optimized CPU implementation of the Bloom filter as the workload becomes larger.
international conference on supercomputing | 2014
Lauro Beltrão Costa; Samer Al-Kiswany; Hao Yang; Matei Ripeanu
System provisioning, resource allocation, and system configuration decisions for I/O-intensive workflow applications are complex even for expert users. Users face choices at multiple levels: allocating resources to individual sub-systems (e.g., the application layer, the storage layer) and configuring each of these optimally (e.g., replication level, chunk size, caching policies in case of storage) all having a large impact on overall application performance. This paper presents our progress on addressing the problem of supporting these provisioning, allocation and configuration decisions for workflow applications. To enable selecting a good choice in a reasonable time, we propose an approach that accelerates the exploration of the configuration space based on a low-cost performance predictor that estimates total execution time of a workflow application in a given setup. Our evaluation shows that: (i) the predictor is effective in identifying the desired system configuration, (ii) it can scale to model a workflow application run on an entire cluster, while (iii) using over 2000x less resources (machines x time) than running the actual application.