Eduardo S. Ogasawara
Centro Federal de Educação Tecnológica de Minas Gerais
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eduardo S. Ogasawara.
international conference on cloud computing | 2010
Daniel de Oliveira; Eduardo S. Ogasawara; Fernanda Araujo Baião; Marta Mattoso
Most of the large-scale scientific experiments modeled as scientific workflows produce a large amount of data and require workflow parallelism to reduce workflow execution time. Some of the existing Scientific Workflow Management Systems (SWfMS) explore parallelism techniques - such as parameter sweep and data fragmentation. In those systems, several computing resources are used to accomplish many computational tasks in homogeneous environments, such as multiprocessor machines or cluster systems. Cloud computing has become a popular high performance computing model in which (virtualized) resources are provided as services over the Web. Some scientists are starting to adopt the cloud model in scientific domains and are moving their scientific workflows (programs and data) from local environments to the cloud. Nevertheless, it is still difficult for the scientist to express a parallel computing paradigm for the workflow on the cloud. Capturing distributed provenance data at the cloud is also an issue. Existing approaches for executing scientific workflows using parallel processing are mainly focused on homogeneous environments whereas, in the cloud, the scientist has to manage new aspects such as initialization of virtualized instances, scheduling over different cloud environments, impact of data transferring and management of instance images. In this paper we propose SciCumulus, a cloud middleware that explores parameter sweep and data fragmentation parallelism in scientific workflow activities (with provenance support). It works between the SWfMS and the cloud. SciCumulus is designed considering cloud specificities. We have evaluated our approach by executing simulated experiments to analyze the overhead imposed by clouds on the workflow execution time.
International Journal of Business Process Integration and Management | 2010
Marta Mattoso; Cláudia Maria Lima Werner; Guilherme Horta Travassos; Vanessa Braganholo; Eduardo S. Ogasawara; Daniel de Oliveira; Sérgio Manuel Serra da Cruz; Wallace Martinho; Leonardo Murta
One of the main challenges of scientific experiments is to allow scientists to manage and exchange their scientific computational resources (data, programs, models, etc.). The effective management of such experiments requires a specific set of cardinal facilities, such as experiment specification techniques, workflow derivation heuristics and provenance mechanisms. These facilities may characterise the experiment life cycle into three phases: composition, execution, and analysis. Works concerned with supporting scientific workflows are mainly concerned with the execution and analysis phase. Therefore, they fail to support the scientific experiment throughout its life cycle as a set of integrated experimentation technologies. In large scale experiments this represents a research challenge. We propose an approach for managing large scale experiments based on provenance gathering during all phases of the life cycle. We foresee that such approach may aid scientists to have more control on the trials of the scientific experiment.
edbt icdt workshops | 2013
Flavio Costa; Vítor Silva; Daniel de Oliveira; Kary A. C. S. Ocaña; Eduardo S. Ogasawara; Jonas Dias; Marta Mattoso
Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data.
many task computing on grids and supercomputers | 2009
Eduardo S. Ogasawara; Daniel de Oliveira; Fernando Chirigati; Carlos Eduardo Barbosa; Renato N. Elias; Vanessa Braganholo; Alvaro L. G. A. Coutinho; Marta Mattoso
One of the main advantages of using a scientific workflow management system (SWfMS) to orchestrate data flows among scientific activities is to control and register the whole workflow execution. The execution of activities within a workflow with high performance computing (HPC) presents challenges in SWfMS execution control. Current solutions leave the scheduling to the HPC queue system. Since the workflow execution engine does not run on remote clusters, SWfMS are not aware of the parallel strategy of the workflow execution. Consequently, remote execution control and provenance registry of the parallel activities is very limited from the SWfMS side. This work presents a set of components to be included on the workflow specification of any SWMfS to control parallelization of activities as MTC. In addition, these components can gather provenance data during remote workflow execution. Through these MTC components, the parallelization strategy can be registered and reused, and provenance data can be uniformly queried. We have evaluated our approach by performing parameter sweep parallelization in solving the incompressible 3D Navier-Stokes equations. Experimental results show the performance gains with the additional benefits of distributed provenance support.
Concurrency and Computation: Practice and Experience | 2013
Eduardo S. Ogasawara; Jonas Dias; Vítor Silva; Fernando Chirigati; Daniel de Oliveira; Fábio Porto; Patrick Valduriez; Marta Mattoso
Large‐scale scientific experiments based on computer simulations are typically modeled as scientific workflows, which eases the chaining of different programs. These scientific workflows are defined, executed, and monitored by scientific workflow management systems (SWfMS). As these experiments manage large amounts of data, it becomes critical to execute them in high‐performance computing environments, such as clusters, grids, and clouds. However, few SWfMS provide parallel support. The ones that do so are usually labor‐intensive for workflow developers and have limited primitives to optimize workflow execution. To address these issues, we developed workflow algebra to specify and enable the optimization of parallel execution of scientific workflows. In this paper, we show how the workflow algebra is efficiently implemented in Chiron, an algebraic based parallel scientific workflow engine. Chiron has a unique native distributed provenance mechanism that enables runtime queries in a relational database. We developed two studies to evaluate the performance of our algebraic approach implemented in Chiron; the first study compares Chiron with different approaches, whereas the second one evaluates the scalability of Chiron. By analyzing the results, we conclude that Chiron is efficient in executing scientific workflows, with the benefits of declarative specification and runtime provenance support. Copyright
Concurrency and Computation: Practice and Experience | 2012
Daniel de Oliveira; Eduardo S. Ogasawara; Kary A. C. S. Ocaña; Fernanda Araujo Baião; Marta Mattoso
Many of the existing large‐scale scientific experiments modeled as scientific workflows are compute‐intensive. Some scientific workflow management systems already explore parallel techniques, such as parameter sweep and data fragmentation, to improve performance. In those systems, computing resources are used to accomplish many computational tasks in high performance environments, such as multiprocessor machines or clusters. Meanwhile, cloud computing provides scalable and elastic resources that can be instantiated on demand during the course of a scientific experiment, without requiring its users to acquire expensive infrastructure or to configure many pieces of software. In fact, because of these advantages some scientists have already adopted the cloud model in their scientific experiments. However, this model also raises many challenges. When scientists are executing scientific workflows that require parallelism, it is hard to decide a priori the amount of resources to use and how long they will be needed because the allocation of these resources is elastic and based on demand. In addition, scientists have to manage new aspects such as initialization of virtual machines and impact of data staging. SciCumulus is a middleware that manages the parallel execution of scientific workflows in cloud environments. In this paper, we introduce an adaptive approach for executing parallel scientific workflows in the cloud. This approach adapts itself according to the availability of resources during workflow execution. It checks the available computational power and dynamically tunes the workflow activity size to achieve better performance. Experimental evaluation showed the benefits of parallelizing scientific workflows using the adaptive approach of SciCumulus, which presented an increase of performance up to 47.1%. Copyright
Concurrency and Computation: Practice and Experience | 2012
Anderson Marinho; Leonardo Murta; Cláudia Maria Lima Werner; Vanessa Braganholo; Sérgio Manuel Serra da Cruz; Eduardo S. Ogasawara; Marta Mattoso
Running scientific workflows in distributed and heterogeneous environments has been a motivating approach for provenance management, which is loosely coupled to the workflow execution engine. This kind of approach is interesting because it allows both storage and access to provenance data in a homogeneous way, even in an environment where different workflow management systems work together. However, current approaches overload scientists with many ad hoc tasks, such as script adaptations and implementations of extra functionalities to provide provenance independence. This paper proposes ProvManager, a provenance management approach that eases the gathering, storage, and analysis of provenance information in a distributed and heterogeneous environment scenario, without putting the burden of adaptations on the scientist. ProvManager leverages the provenance management at the experiment level by integrating different workflow executions from multiple workflow management systems. Copyright
ieee international conference on escience | 2011
Kary A. C. S. Ocaña; Daniel de Oliveira; Jonas Dias; Eduardo S. Ogasawara; Marta Mattoso
Phylogenetic analysis and multiple sequence alignment (MSA) are closely related bioinformatics fields. Phylogenetic analysis makes extensive use of MSA in the construction of phylogenetic trees, which are used to infer the evolutionary relationships between homologous genes. These bioinformatics experiments are usually modeled as scientific workflows. There are many alternative workflows that use different MSA methods to conduct phylogenetic analysis and each one can produce MSA with different quality. Scientists have to explore which MSA method is the most suitable for their experiments. However, workflows for phylogenetic analysis are both computational and data intensive and they may run sequentially during weeks. Although there any many approaches that parallelize these workflows, exploring all MSA methods many become a burden and expensive task. If scientists know the most adequate MSA method a priori, it would spare time and money. To optimize the phylogenetic analysis workflow, we propose in this paper SciHmm, a bioinformatics scientific workflow based in profile hidden Markov models (pHMMs) that aims at determining the most suitable MSA method for a phylogenetic analysis prior than executing the phylogenetic workflow. SciHmm is also executed in parallel in a cloud environment using SciCumulus middleware. The results demonstrated that optimizing a phylogenetic analysis using SciHmm considerably reduce the total execution time of phylogenetic analysis (up to 80%). This optimization also demonstrates that the biological results presented more quality. In addition, the parallel execution of SciHmm demonstrates that this kind of bioinformatics workflow is suitable to be executed in the cloud.
Future Generation Computer Systems | 2013
Daniel de Oliveira; Kary A. C. S. Ocaña; Eduardo S. Ogasawara; Jonas Dias; João Carlos de A. R. Gonçalves; Fernanda Araujo Baião; Marta Mattoso
Data analysis is an exploratory process that demands high performance computing (HPC). SciPhylomics, for example, is a data-intensive workflow that aims at producing phylogenomic trees based on an input set of protein sequences of genomes to infer evolutionary relationships among living organisms. SciPhylomics can benefit from parallel processing techniques provided by existing approaches such as SciCumulus cloud workflow engine and MapReduce implementations such as Hadoop. Despite some performance fluctuations, computing clouds provide a new dimension for HPC due to its elasticity and availability features. In this paper, we present a performance evaluation for SciPhylomics executions in a real cloud environment. The workflow was executed using two parallel execution approaches (SciCumulus and Hadoop) at the Amazon EC2 cloud. Our results reinforce the benefits of parallelizing data for the phylogenomic inference workflow using MapReduce-like parallel approaches in the cloud. The performance results demonstrate that this class of bioinformatics experiment is suitable to be executed in the cloud despite its need for high performance capabilities. The evaluated workflow shares many features of several data intensive workflows, which present first insights that these cloud execution results can be extrapolated to other classes of experiments.
high performance distributed computing | 2010
Fábio Coutinho; Eduardo S. Ogasawara; Daniel de Oliveira; Vanessa Braganholo; Alexandre A. B. Lima; Alberto M. R. Dávila; Marta Mattoso
Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific workflows. Current Scientific Workflow Management Systems (SWfMS) are used to orchestrate these workflows to control and monitor the whole execution. It is very common in bioinformatics experiments to process very large datasets. In this way, data parallelism is a common approach used to increase performance and reduce overall execution time. However, most of current SWfMS still lack on supporting parallel executions in high performance computing (HPC) environments. Additionally keeping track of provenance data in distributed environments is still an open, yet important problem. Recently, Hydra middleware was proposed to bridge the gap between the SWfMS and the HPC environment, by providing a transparent way for scientists to parallelize workflow executions while capturing distributed provenance. This paper analyzes data parallelism scenarios in bioinformatics domain and presents an extension to Hydra middleware through a specific cartridge that promotes data parallelism in bioinformatics workflows. Experimental results using workflows with BLAST show performance gains with the additional benefits of distributed provenance support.