Is this you? Create Your Porfile

Renato Ferreira

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Renato Ferreira is active.

Explore More

Publication

Featured researches published by Renato Ferreira.

parallel computing | 2002

Processing large-scale multi-dimensional data in parallel and distributed environments

Michael D. Beynon; Chialin Chang; Tahsin M. Kurç; Alan Sussman; Henrique Andrade; Renato Ferreira; Joel H. Saltz

Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientists ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.

conference on high performance computing (supercomputing) | 2002

Executing Multiple Pipelined Data Analysis Operations in the Grid

Matthew Spencer; Renato Ferreira; Michael D. Beynon; Tahsin M. Kurç; Alan Sussman; Joel H. Saltz

Processing of data in many data analysis applications can be represented as an acyclic, coarse grain data flow, from data sources to the client. This paper is concerned with scheduling of multiple data analysis operations, each of which is represented as a pipelined chain of processing on data. We define the scheduling problem for effectively placing components onto Grid resources, and propose two scheduling algorithms. Experimental results are presented using a visualization application.

international parallel processing symposium | 1999

Infrastructure for building parallel database systems for multi-dimensional data

Chialin Chang; Renato Ferreira; Alan Sussman; Joel H. Saltz

Our study of a large set of scientific applications over the past three years indicates that the processing for multidimensional datasets is often highly stylized. The basic processing step usually consists of mapping the individual input items to the output grid and computing output items by aggregating, in some way, all the input items mapped to the corresponding grid point. In this paper we discuss the design and performance of T2, an infrastructure for building parallel database systems that integrates storage, retrieval and processing of multi-dimensional datasets. It achieves its primary advantage from the ability to integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. We present preliminary performance results comparing the implementation of two applications using the T2 services with custom-built integrated implementations.

international parallel and distributed processing symposium | 2002

Improving performance of multiple sequence alignment analysis in multi-client environments

Eric Stahlberg; Renato Ferreira; Tahsin M. Kurç; Joel H. Saltz

This paper is concerned with the efficient execution of multiple sequence alignment methods in a multiple client environment. Multiple sequence alignment (MSA) is a computationally expensive method, which is commonly used in computational and molecular biology. Large databases of protein and gene sequences are available to the scientific community. Oftentimes, these databases are accessed by multiple users to execute MSA queries. The data server has to handle multiple concurrent queries in such situations. We look at the effect of data caching on the performance of the data server. We describe an approach for caching intermediate results for reuse in subsequent or concurrent queries. We focus on progressive alignment-based strategies, in particular the CLUSTAL W algorithm. Our results for 350 sets of sequences show an average speedup of up to 2.5 is obtained by caching intermediate results. Our results also show that the cache-enabled CLUSTAL W program scales well on a SMP machine.

international conference on supercomputing | 2000

Compiling object-oriented data intensive applications

Renato Ferreira; Gagan Agrawal; Joel H. Saltz

Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. High-level language and compiler support for developing applications that analyze and process such datasets has, however, been lacking so far. In this paper, we present a set of language extensions and a prototype compiler for supporting high-level object-oriented programming of data intensive reduction operations over multidimensional data. We have chosen a dialect of Java with data-parallel extensions for specifying collection of objects, a parallel for loop, and reduction variables as our source high-level language. Our compiler analyzes parallel loops and optimizes the processing of datasets through the use of an existing run-time system, called Active Data Repository (ADR). We show how loop fission followed by interprocedural static program slicing can be used by the compiler to extract required information for the run-time system. We present the design of a compiler/n-time interface which allows the compiler to effectively utilize the existing run-time system. A prototype compiler incorporating these techniques has been developed using the Titanium front-end from Berkeley. We have evaluated this compiler by comparing the performance of compiler generated code with hand customized ADR code for three templates, from the areas of digital microscopy and scientific simulations. Our experimental results show that the performance of compiler generated versions is, on the average 21% lower, and in all cases within a factor of two, of the performance of hand coded versions.

cluster computing and the grid | 2007

An Efficient and Reliable Scientific Workflow System

T. Tavares; George Teodoro; Tahsin M. Kurç; Renato Ferreira; Dorgival O. Guedes; Wagner Meira

This paper presents a fault tolerance framework for applications that process data using a distributed network of user-defined operations in a pipelined fashion. The framework saves intermediate results and messages exchanged among application components in a distributed data management system to facilitate quick recovery from failures. The experimental results show that the framework scales well and our approach introduces very little overhead to application execution.

international conference on supercomputing | 2002

Compiler supported high-level abstractions for sparse disk-resident datasets

Renato Ferreira; Gagan Agrawal; Joel H. Saltz

Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. The complexity and irregularity of datasets in many domains make the task of developing such processing applications tedious and error-prone.We propose use of high-level abstractions for hiding the irregularities in these datasets and enabling rapid development of correct data processing applications. We present two execution strategies and a set of compiler analysis techniques for obtaining high performance from applications written using our proposed high-level abstractions. Our execution strategies achieve high locality in disk accesses. Once a disk block is read from the disk, all iterations that access any of the elements from this disk block are performed. To support our execution strategies and improve the performance, we have developed static analysis techniques for: 1) computing the set of iterations that access a particular right-hand-side element, 2) generating a function that can be applied to the meta-data associated with each disk block, for determining if that disk block needs to be read, and 3) performing code hoisting of conditionals.We present experimental results from a prototype compiler implementing our techniques to demonstrate the effectiveness of our approach.

international symposium on biomedical imaging | 2009

A caGrid-enabled, learning based image segmentation method for histopathology specimens

David J. Foran; Lin Yang; Oncel Tuzel; Wenjin Chen; Jun Hu; Tahsin M. Kurç; Renato Ferreira; Joel H. Saltz

Accurate segmentation of tissue microarrays is a challenging topic because of some of the similarities exhibited by normal tissue and tumor regions. Processing speed is another consideration when dealing with imaged tissue microarrays as each microscopic slide may contain hundreds of digitized tissue discs. In this paper, a fast and accurate image segmentation algorithm is presented. Both a whole disc delineation algorithm and a learning based tumor region segmentation approach which utilizes multiple scale texton histograms are introduced. The algorithm is completely automatic and computationally efficient. The mean pixel-wise segmentation accuracy is about 90%. It requires about 1 second for whole disc (1024times1024 pixels) segmentation and less than 5 seconds for segmenting tumor regions. In order to enable remote access to the algorithm and collaborative studies, an analytical service is implemented using the caGrid infrastructure. This service wraps the algorithm and provides interfaces for remote clients to submit images for analysis and retrieve analysis results.

conference on high performance computing (supercomputing) | 2003

Optimizing Reduction Computations In a Distributed Environment

Tahsin M. Kurç; Feng Lee; Gagan Agrawal; Renato Ferreira; Joel H. Saltz

We investigate runtime strategies for data-intensive applications that invovle generalized reductions on large, distributed datasets. Our set of strategies includes replicated filter state, partitioned filter state, and hybrid options between these two extremes. We evaluate these strategies using emulators of three real applications, different query and output sizes, and a number of configurations. We consider execution in a homogeneous cluster and in a distributed environment where only a subset of nodes hst the data. Our results show replicating the filter state scales well and outperforms other schemes, if sufficient memory is available and sufficient computation is involved to offset the cost of global merge step. In other cases, hybrid is usually the best. Moreover, in almost all cases, the performance of the hybrid strategy is quite close to the best strategy. Thus, we believe that hybrid is an attractive approach when the relative performance of different schemes cannot be predicted.

international conference on parallel architectures and compilation techniques | 2001

Compiler and Runtime Analysis for Efficient Communication in Data Intensive Applications

Renato Ferreira; Gagan Agrawal; J. Saltzt

Abstract: Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler that processes data intensive applications written in a dialect of Java and compiles them for efficient execution on distributed memory parallel machines. In this paper, we focus on the problem of generating correct and efficient communication for data intensive applications. We present static analysis techniques for 1) extracting a global reduction function from a data parallel loop, and 2) determining if a subscript function is monotonic. We also present a runtime technique for reducing the volume of communication during the global reduction phase. We have experimented with two data intensive applications to evaluate the efficacy of our techniques. Our results show that 1) our techniques for extracting global reduction functions and establishing monotonicity of subscript functions can successfully handle these applications, 2) significant reduction in communication volume and execution times is achieved through our runtime analysis technique, 3) runtime communication analysis is critical for achieving speedups on parallel configurations.

Explore More