Elizeu Santos-Neto
University of British Columbia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Elizeu Santos-Neto.
international conference on parallel processing | 2003
Walfredo Cirne; Daniel Paranhos; Lauro Beltrão Costa; Elizeu Santos-Neto; Francisco Vilar Brasileiro; Jacques Philippe Sauvé; F.A.B. Silva; C.O. Barros; C. Silveira
We here discuss how to run Bag-of-Tasks applications on computational grids. Bag-of-Tasks applications (those parallel applications whose tasks are independent) are both relevant and amendable for execution on grids. However, few users currently execute their Bag-of-Tasks applications on grids. We investigate the reason for this state of affairs and introduce MyGrid, a system designed to overcome the identified difficulties. MyGrid provides a simple, complete and secure way for a user to run Bag-of-Tasks applications on all resources she has access to. Besides putting together a complete solution useful for real users, MyGrid embeds two important research contributions to grid computing. First, we introduce some simple working environment abstractions that hide machine configuration heterogeneity from the user. Second, we introduce work queue with replication (WQR), a scheduling heuristics that attains good performance without relying on information about the grid or the application, although consuming a few more cycles. Note that not depending on information makes WQR much easier to deploy in practice
job scheduling strategies for parallel processing | 2004
Elizeu Santos-Neto; Walfredo Cirne; Francisco Vilar Brasileiro; Aliandro Lima
Data-intensive applications executing over a computational grid demand large data transfers. These are costly operations. Therefore, taking them into account is mandatory to achieve efficient scheduling of data-intensive applications on grids. Further, within a heterogeneous and ever changing environment such as a grid, better schedules are typically attained by heuristics that use dynamic information about the grid and the applications. However, this information is often difficult to be accurately obtained. On the other hand, although there are schedulers that attain good performance without requiring dynamic information, they were not designed to take data transfer into account. This paper presents Storage Affinity, a novel scheduling heuristic for bag-of-tasks data-intensive applications running on grid environments. Storage Affinity exploits a data reuse pattern, common on many data-intensive applications, that allows it to take data transfer delays into account and reduce the makespan of the application. Further, it uses a replication strategy that yields efficient schedules without relying upon dynamic information that is difficult to obtain. Our results show that Storage Affinity may attain better performance than the state-of-the-art knowledge-dependent schedulers. This is achieved at the expense of consuming more CPU cycles and network bandwidth.
high performance distributed computing | 2008
Samer Al-Kiswany; Abdullah Gharaibeh; Elizeu Santos-Neto; George L. Yuan; Matei Ripeanu
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support. We take a first step towards validating this hypothesis, focusing on distributed storage systems. As a proof of concept, we design StoreGPU, a library that accelerates a number of hashing based primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up to eight-fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
Computer Networks | 2009
Nazareno Andrade; Elizeu Santos-Neto; Francisco Vilar Brasileiro; Matei Ripeanu
BitTorrent is a widely popular peer-to-peer content distribution protocol. Unveiling patterns of resource demand and supply in its usage is paramount to inform operators and designers of BitTorrent and of future content distribution systems. This study examines three BitTorrent content-sharing communities regarding resource demand and supply. The resulting characterization is significantly broader and deeper than previous BitTorrent investigations: it compares multiple BitTorrent communities and investigates aspects that have not been characterized before, such as aggregate user behavior and resource contention. The main findings are three-fold: (i) resource demand - a more accurate model for the peer arrival rate over time is introduced, contributing to workload synthesis and analysis; additionally, torrent popularity distributions are found to be non-heavy-tailed, what has implications on the design of BitTorrent caching mechanisms; (ii) resource supply - a small set of users contributes most of the resources in the communities, but the set of heavy contributors changes over time and is typically not responsible for most resources used in the distribution of an individual file; these results imply some level of robustness can be expected in BitTorrent communities and directs resource allocation efforts; (iii) relation between resource demand and supply - users that provide more resources are also those that demand more from it; also, the distribution of a file usually experiences resource contention, although the communities achieve high rates of served requests.
international parallel and distributed processing symposium | 2013
Abdullah Gharaibeh; Lauro Beltrão Costa; Elizeu Santos-Neto; Matei Ripeanu
Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much lower price point.
acm/ieee international conference on mobile computing and networking | 2011
Dinan Gunawardena; Thomas Karagiannis; Alexandre Proutiere; Elizeu Santos-Neto; Milan Vojnovic
We consider the problem of delivering information streams to interested mobile users, leveraging both access to the infrastructure and device-to-device data transfers. The goal is to design practical relaying algorithms that aim at optimizing a global system objective that accounts for two important aspects: first, the user interest in content with respect to its type and delivery time; and, second, resource constraints such as storage and transmission costs. We first examine a set of real-world datasets reporting contacts between users moving in relatively restricted geographic areas (e.g. a city). These datasets provide evidence that significant performance gains can be achieved by extending the information dissemination from one to two hops, and that using longer paths only brings marginal benefits. We also show that correlation of delays through different paths is typically significant, thus asking for system design that would allow for general user mobility. We then propose a class of relaying strategies (referred to as SCOOP) that aim at optimizing a global system objective, are fully decentralized, require only locally observable states by individual devices, and allow for general user mobility. These properties characterize a practical scheme whose efficiency is evaluated using real-world mobility traces.
symposium on computer architecture and high performance computing | 2004
Walfredo Cirne; Francisco Vilar Brasileiro; Lydie Da Costa; Daniel Paranhos; Elizeu Santos-Neto; Nazareno Andrade; C.A.F. De Rose; Tiago C. Ferreto; Miranda Mowbray; R. Scheer; J. Jornada
In this paper we discuss the difficulties involved in the scheduling of applications on computational grids. We highlight two main sources of difficulties: 1) the size of the grid rules out the possibility of using a centralized scheduler; 2) since resources are managed by different parties, the scheduler must consider several different policies. Thus, we argue that scheduling applications on a grid require the orchestration of several schedulers, with possibly conflicting goals. We discuss how we have addressed this issue in the context of PAUA, a grid for Bag-of-Tasks applications (i.e. parallel applications whose tasks are independent) that we are currently deploying throughout Brazil.
irregular applications: architectures and algorithms | 2013
Abdullah Gharaibeh; Elizeu Santos-Neto; Lauro Beltrão Costa; Matei Ripeanu
This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to processing units, such that the algorithmic tasks exercise each of the units where they perform best. However, the GPUs have much higher Thermal Design Power (TDP), thus their impact on the overall energy consumption is unclear. Our evaluation using large real-world graphs and synthetic graphs as large as 1 billion vertices and 16 billion edges shows that a hybrid system is efficient in terms of both time-to-solution and energy.
mobile wireless middleware operating systems and applications | 2012
Xiao Chen; Elizeu Santos-Neto; Matei Ripeanu
An increasing number of mobile applications aim to enable “smart cities” by harnessing contributions from citizens armed with mobile devices that have sensing ability. However, there are few generally recognized guidelines for developing and deploying crowdsourcing-based solutions in mobile environments. This paper considers the design of a crowdsourcing-based smart parking system as a specific case study in an attempt to explore the basic design principles applicable to an array of similar applications. Through simulations, we show that the strategies behind crowdsourcing can heavily influence the utility of such applications. Equally importantly, we show that tolerating a certain level of freeriding increases the social benefits while maintaining quality of service level offered. Our findings provide designers with a better understanding of mobile crowdsourcing features and help guide successful designs.
Cluster Computing | 2009
Samer Al-Kiswany; Abdullah Gharaibeh; Elizeu Santos-Neto; Matei Ripeanu
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support.We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.