Rodrigo Bruno | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rodrigo Bruno is active.

Explore More

Publication

Featured researches published by Rodrigo Bruno.

european conference on computer systems | 2014

freeCycles: efficient data distribution for volunteer computing

Rodrigo Bruno; Paulo Ferreira

Volunteer Computing (VC) has been proving to be a way to access large amounts of computational power, network bandwidth and storage. With the recent developments of new programming paradigms and their adaptation to run on the large scale Internet, we believe that data distribution techniques need to be re-thought in order to cope with the high volumes of information handled by, for example, MapReduce. Thus, we present a VC solution called freeCycles, that supports MapReduce jobs. freeCycles presents two new contributions: i) improves data distribution (among mappers and reducers) by using the BitTorrent protocol to distribute (input, intermediate and output) data, ii) improves intermediate data availability by replicating it through volunteers in order to avoid losing intermediate data and consequently preventing big delays on the MapReduce execution time.

Proceedings of the 2nd International Workshop on CrossCloud Systems | 2014

SCADAMAR: scalable and data-efficient internet MapReduce

Rodrigo Bruno; Paulo Ferreira

Recent developments of popular programming models, namely MapReduce, have raised the interest of running MapReduce applications over the large scale Internet. However, current data distribution techniques used in Internet wide computing platforms to distribute the high volumes of information, which are needed to run MapReduce jobs, are naive, and therefore need to be re-thought. Thus, we present a computing platform called SCADAMAR that runs MapReduce jobs over the Internet and provides two new main contributions: i) improves data distribution by using the BitTorrent protocol to distribute all data, and ii) improves intermediate data availability by replicating tasks or data through nodes in order to avoid losing intermediate data and consequently preventing big delays on the MapReduce overall execution time. Along with the design of our solution, we present an extensive set of performance results which confirm the usefulness of the above mentioned contributions, improved data distribution and availability, thus making our platform a feasible approach to run MapReduce jobs.

ACM Computing Surveys | 2018

A Study on Garbage Collection Algorithms for Big Data Environments

Rodrigo Bruno; Paulo Ferreira

The need to process and store massive amounts of data—Big Data—is a reality. In areas such as scientific experiments, social networks management, credit card fraud detection, targeted advertisement, and financial analysis, massive amounts of information are generated and processed daily to extract valuable, summarized information. Due to its fast development cycle (i.e., less expensive to develop), mainly because of automatic memory management, and rich community resources, managed object-oriented programming languages (e.g., Java) are the first choice to develop Big Data platforms (e.g., Cassandra, Spark) on which such Big Data applications are executed. However, automatic memory management comes at a cost. This cost is introduced by the garbage collector, which is responsible for collecting objects that are no longer being used. Although current (classic) garbage collection algorithms may be applicable to small-scale applications, these algorithms are not appropriate for large-scale Big Data environments, as they do not scale in terms of throughput and pause times. In this work, current Big Data platforms and their memory profiles are studied to understand why classic algorithms (which are still the most commonly used) are not appropriate, and also to analyze recently proposed and relevant memory management algorithms, targeted to Big Data environments. The scalability of recent memory management algorithms is characterized in terms of throughput (improves the throughput of the application) and pause time (reduces the latency of the application) when compared to classic algorithms. The study is concluded by presenting a taxonomy of the described works and some open problems, with regard to Big Data memory management, that could be addressed in future works.

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on | 2017

POLM2: automatic profiling for object lifetime-aware memory management for hotspot big data applications

Rodrigo Bruno; Paulo Ferreira

Big Data applications suffer from unpredictable and unacceptably high pause times due to bad memory management (Garbage Collection, GC) decisions. This is a problem for all applications but it is even more important for applications with low pause time requirements such as credit-card fraud detection or targeted website advertisement systems, which can easily fail to comply with Service Level Agreements due to long GC cycles (during which the application is stopped). This problem has been previously identified and is related to Big Data applications keeping in memory (for a long period of time, from the GCs perspective) massive amounts of data objects. Memory management approaches have been proposed to reduce the GC pause time by allocating objects with similar lifetimes close to each other. However, they either do not provide a general solution for all types of Big Data applications (thus only solving the problem for a specific set of applications), and/or require programmer effort and knowledge to change/annotate the application code. This paper proposes POLM2, a profiler that automatically: i) estimates application allocation profiles based on execution records, and ii) instruments application bytecode to help the GC taking advantage of the profiling information. Thus, no programmer effort is required to change the source code to allocate objects according to their lifetimes. POLM2 is implemented for the OpenJDK HotSpot Java Virtual Machine 8 and uses NG2C, a recently proposed GC which supports multi-generational pretenuring. Results show that POLM2 is able to: i) achieve pauses as low as NG2C (which requires manual source code modification), and ii) significantly reduce application pauses by up to 80% when compared to G1 (default collector in OpenJDK). POLM2 does not negatively impact neither application throughput nor memory utilization.

grid computing | 2017

freeCycles - Efficient Multi-Cloud Computing Platform

Rodrigo Bruno; Fernando Pestana da Costa; Paulo Ferreira

The growing adoption of the MapReduce programming model increases the appeal of using Internet-wide computing platforms to run MapReduce applications on the Internet. However, current data distribution techniques, used in such platforms to distribute the high volumes of information which are needed to run MapReduce jobs, are naive, and therefore fail to offer an efficient approach for running MapReduce over the Internet. Thus, we propose a computing platform called freeCycles that runs MapReduce jobs over the Internet and provides two new main contributions: i) it improves data distribution, and ii) it increases intermediate data availability by replicating tasks or data through nodes in order to avoid losing intermediate data and consequently avoiding significant delays on the overall MapReduce execution time. We present the design and implementation of freeCycles, in which we use the BitTorrent protocol to distribute all data, along with an extensive set of performance results, which confirm the usefulness of the above mentioned contributions. Our system’s improved data distribution and availability makes it an ideal platform for large scale MapReduce jobs.

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2017

Locality-Aware GC Optimisations for Big Data Workloads

Duarte Patrício; Rodrigo Bruno; José Simão; Paulo Ferreira; Luís Veiga

Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help processing massive amounts of data. In addition, managed runtimes (e.g. JVM) are now widely used to support the execution of these NoSQL storage solutions, particularly when dealing with Big Data key-value store-driven applications. The benefits of such runtimes can however be limited by automatic memory management, i.e., Garbage Collection (GC), which does not consider object locality, resulting in objects that point to each other being dispersed in memory. In the long run this may break the service-level of applications due to extra page faults and degradation of locality on system-level memory caches. We propose, LAG1 (short for Locality-Aware G1), an extension of modern heap layouts to promote locality between groups of related objects. This is done with no previous application profiling and in a way that is transparent to the programmer, without requiring changes to existing code. The heap layout and algorithmic extensions are implemented on top of the Garbage First (G1) garbage collector (the new by-default collector) of the HotSpot JVM. Using the YCSB benchmarking tool to benchmark HBase, a well-known and widely used Big Data application, we show negligible overhead in frequent operations such as the allocation of new objects, and significant improvements when accessing data, supported by higher hits in system-level memory structures.

international symposium on memory management | 2017