Raphael Y. de Camargo
Universidade Federal do ABC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Raphael Y. de Camargo.
middleware for grid computing | 2004
Raphael Y. de Camargo; Andrei Goldchleger; Fabio Kon; Alfredo Goldman
InteGrade is a grid middleware infrastructure that enables the use of idle computing power from user workstations. One of its goals is to support the execution of long-running parallel applications that present a considerable amount of communication among application nodes. However, in an environment composed of shared user workstations spread across many different LANs, machines may fail, become unaccessible, or may switch from idle to busy very rapidly, compromising the execution of the parallel application in some of its nodes. Thus, to provide some mechanism for fault-tolerance becomes a major requirement for such a system. In this paper, we describe the support for checkpoint-based rollback recovery of parallel BSP applications running over the InteGrade middleware. This mechanism consists of periodically saving application state to permit to restart its execution from an intermediate execution point in case of failure. A precompiler automatically instruments the source-code of a C/C++ application, adding code for saving and recovering application state. A failure detector monitors the application execution. In case of failure, the application is restarted from the last saved global check-point.
Journal of Parallel and Distributed Computing | 2010
Francisco José da Silva e Silva; Fabio Kon; Alfredo Goldman; Marcelo Finger; Raphael Y. de Camargo; Fernando Castor Filho; Fábio M. Costa
The InteGrade project is a multi-university effort to build a novel grid computing middleware based on the opportunistic use of resources belonging to user workstations. The InteGrade middleware currently enables the execution of sequential, bag-of-tasks, and parallel applications that follow the BSP or the MPI programming models. This article presents the lessons learned over the last five years of the InteGrade development and describes the solutions achieved concerning the support for robust application execution. The contributions cover the related fields of application scheduling, execution management, and fault tolerance. We present our solutions, describing their implementation principles and evaluation through the analysis of several experimental results.
BMC Bioinformatics | 2013
Fabrizio F. Borelli; Raphael Y. de Camargo; David Correa Martins; Luiz C. S. Rozante
BackgroundGene regulatory networks (GRN) inference is an important bioinformatics problem in which the gene interactions need to be deduced from gene expression data, such as microarray data. Feature selection methods can be applied to this problem. A feature selection technique is composed by two parts: a search algorithm and a criterion function. Among the search algorithms already proposed, there is the exhaustive search where the best feature subset is returned, although its computational complexity is unfeasible in almost all situations. The objective of this work is the development of a low cost parallel solution based on GPU architectures for exhaustive search with a viable cost-benefit. We use CUDA™, a general purpose parallel programming platform that allows the usage of NVIDIA® GPUs to solve complex problems in an efficient way.ResultsWe developed a parallel algorithm for GRN inference based on multiple GPU cards and obtained encouraging speedups (order of hundreds), when assuming that each target gene has two multivariate predictors. Also, experiments using single and multiple GPUs were performed, indicating that the speedup grows almost linearly with the number of GPUs.ConclusionIn this work, we present a proof of principle, showing that it is possible to parallelize the exhaustive search algorithm in GPUs with encouraging results. Although our focus in this paper is on the GRN inference problem, the exhaustive search technique based on GPU developed here can be applied (with minor adaptations) to other combinatorial problems.
Concurrency and Computation: Practice and Experience | 2006
Raphael Y. de Camargo; Andrei Goldchleger; Fabio Kon; Alfredo Goldman
InteGrade is a Grid middleware infrastructure that enables the use of idle computing power from user workstations. One of its goals is to support the execution of long‐running parallel applications that present a considerable amount of communication among application nodes. However, in an environment composed of shared user workstations spread across many different LANs, machines may fail, become inaccessible, or may switch from idle to busy very rapidly, compromising the execution of the parallel application in some of its nodes. Thus, to provide some mechanism for fault tolerance becomes a major requirement for such a system. In this paper, we describe the support for checkpoint‐based rollback recovery of Bulk Synchronous Parallel applications running over the InteGrade middleware. This mechanism consists of periodically saving application state to permit the application to restart its execution from an intermediate execution point in case of failure. A precompiler automatically instruments the source code of a C/C++ application, adding code for saving and recovering application state. A failure detector monitors the application execution. In case of failure, the application is restarted from the last saved global checkpoint. Copyright
Concurrency and Computation: Practice and Experience | 2011
Raphael Y. de Camargo; Luiz C. S. Rozante; Siang W. Song
Large‐scale simulations of parts of the brain using detailed neuronal models to improve our understanding of brain functions are becoming a reality with the usage of supercomputers and large clusters. However, the high acquisition and maintenance cost of these computers, including the physical space, air conditioning, and electrical power, limits the number of simulations of this kind that scientists can perform. Modern commodity graphical cards, based on the CUDA platform, contain graphical processing units (GPUs) composed of hundreds of processors that can simultaneously execute thousands of threads and thus constitute a low‐cost solution for many high‐performance computing applications. In this work, we present a CUDA algorithm that enables the execution, on multiple GPUs, of simulations of large‐scale networks composed of biologically realistic Hodgkin–Huxley neurons. The algorithm represents each neuron as a CUDA thread, which solves the set of coupled differential equations that model each neuron. Communication among neurons located in different GPUs is coordinated by the CPU. We obtained speedups of 40 for the simulation of 200k neurons that received random external input and speedups of 9 for a network with 200k neurons and 20M neuronal connections, in a single computer with two graphic boards with two GPUs each, when compared with a modern quad‐core CPU. Copyright
communication system software and middleware | 2009
Jo Ueyama; Vítor P. V. Pinto; Edmundo Roberto Mauro Madeira; Paul Grace; Thienne M. M. Jonhson; Raphael Y. de Camargo
We are witnessing increasing demand for applications that are runnable on a wide range of mobile devices (e.g. wireless laptops, mobile phones, sensors). In addition, the emergence of new software technologies (e.g. component approaches, publish subscribe bindings, web services, service discovery protocols) has demanded that such applications face heterogeneous software platforms. However, existing approaches for building mobile device applications are often targeted to a particular platform (e.g. mobile phones, PDAs, sensors) and software technology (Web Services, Microsoft COM, Java components). This paper discusses the use of a generic component approach for the construction of adaptive applications that can integrate and re-use technologies (e.g. middleware and legacy components) and deploy them across heterogeneous devices. We have implemented a Java prototype for J2ME virtual machines and evaluated the potential benefits using development case-studies and performance measures. We show that we can address a wide range of heterogeneity with minimal resource overheads.
international conference on computational advances in bio and medical sciences | 2012
Fabrizio F. Borelli; Raphael Y. de Camargo; David Correa Martins; Beatriz Stransky; Luiz C. S. Rozante
Gene regulatory networks (GRN) inference is an important bioinformatics problem in which the gene interactions need to be deduced from gene expression data, such as microarray data. Feature selection methods can be applied to this problem. A feature selection technique is composed by two parts: a search algorithm and a criterion function. Among the search algorithms already proposed, there is the exhaustive search where the best feature subset is returned, although its computational complexity is unfeasible in almost all situations. The objective of this work is the development of a low cost parallel solution based on GPU architectures for exhaustive search with a viable cost-benefit. CUDA™ is a general purpose parallel architecture with a new parallel programming model allowing that the NVIDIA® GPUs solve complex problems in an efficient way. We developed a parallel algorithm for GRN inference based on the GPU/CUDA and encouraging speedups (60x) were achieved when assuming that each target gene has two predictors. The idea behind the proposed method can be applied considering three or more predictors for each target gene as well.
Proceedings of the 3rd international Middleware doctoral symposium on | 2006
Raphael Y. de Camargo; Fabio Kon
Grid applications typically need to deal with large amounts of data. The traditional approach for data storage is to employ high-performance dedicated servers with data replication. However, a class of computational grids, called opportunistic grids, focus on the usage of idle resources from shared machines. These machines normally have large quantities of unused storage space that could be used when the machines are idle, allowing opportunistic grids to share not only computational cycles, but also storage space.In this work, we present the initial design of OppStore, a middleware that provides reliable storage using the free storage space from shared grid machines. The storage can be transparently accessed from any grid machine, allowing easy data sharing among grid users and applications. The system uses a two-level peer-to-peer organization to connect grid machines in a scalable and fault-tolerant way. To deal with resource heterogeneity, we developed the concept of virtual ids, which allows the creation of virtual spaces located on top of the peer-to-peer routing substrate. These virtual spaces enables the middleware to perform heterogeneity-aware, load-balancing selection of storage sites using multiple simultaneous metrics.
ieee/acm international symposium cluster, cloud and grid computing | 2015
Danilo Carastan-Santos; Raphael Y. de Camargo; David Correa Martins; Siang W. Song; Luiz C. S. Rozante; Fabrizio F. Borelli
Gene regulatory networks inference is one of the crucial problems of the Systems Biology field. It is still an open problem, mainly because of its high dimensionality (thousands of genes) with a limited number of samples (dozens), making it difficult to estimate dependencies among genes. Besides the estimation problem, another important hindrance is the inherent computational complexity of GRN inference methods. In this work, we focus on circumventing performance issues of a technique based on signal perturbations to infer gene dependencies. One of its main steps consists in solving the Hitting Set problem (HSP), which is NP-Hard. There are many proposals to obtain approximate or exact solutions to this problem. One of these proposals consists of a Graphical Processing Unit (GPU) based algorithm to obtain exact solutions to the HSP. However, such method is not scalable for real size GRNs. We propose an extension of the HSP algorithm to deal with input sets containing thousands of variables by introducing innovations in the data structures and a sorting scheme to allow efficient discarding of Hitting Set non-solution candidates. We provide an implementation for multi-core CPUs and GPU clusters. Our experimental results show that the usage of the sorting scheme brings speedups of up to 3.5 in the CPU implementation. Moreover, using a single GPU, we could obtain an additional speedup of up to 4.7, in comparison with the multithreaded CPU implementation. Finally, usage of eight GPUs from a GPU cluster brought an additional speedup of up to 6.6. Combining all techniques, speedups above 60 were obtained for the parallel part of the algorithm.
ieee international conference on high performance computing, data, and analytics | 2011
Raphael Y. de Camargo
Graphical Processing Units (GPUs) are frequently used for simulations of physical and biological systems. The simulated systems are often composed of simple elements that com municate only with their neighbors. But in some systems, such as large-scale neuronal networks, each element can communicate with any other element in the simulation. In this work, we present an efficient CUDA algorithm that enables this type of communication, even when using multiple GPUs. We show that it can benefit from the large memory bandwidth and number of cores in the GPU, despite the small number of required floating point operations. We implemented and evaluated this algorithm in a GPU simulator for large-scale neuronal networks. We obtained speedups of over 10 for the communication steps for simulations with 50k neurons and 50M connections, using a single computer with 2 graphic boards with 2 GPUs each, when compared with a modern quad-core CPU. When we consider the complete neuronal network simulation, its execution was nearly 40 times faster in the GPU than in the CPU.