Is this you? Create Your Porfile

Vítor Silva

Federal University of Rio de Janeiro

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vítor Silva is active.

Explore More

Publication

Featured researches published by Vítor Silva.

edbt icdt workshops | 2013

Capturing and querying workflow runtime provenance with PROV: a practical approach

Flavio Costa; Vítor Silva; Daniel de Oliveira; Kary A. C. S. Ocaña; Eduardo S. Ogasawara; Jonas Dias; Marta Mattoso

Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data.

Concurrency and Computation: Practice and Experience | 2013

Chiron: a parallel engine for algebraic scientific workflows

Eduardo S. Ogasawara; Jonas Dias; Vítor Silva; Fernando Chirigati; Daniel de Oliveira; Fábio Porto; Patrick Valduriez; Marta Mattoso

Large‐scale scientific experiments based on computer simulations are typically modeled as scientific workflows, which eases the chaining of different programs. These scientific workflows are defined, executed, and monitored by scientific workflow management systems (SWfMS). As these experiments manage large amounts of data, it becomes critical to execute them in high‐performance computing environments, such as clusters, grids, and clouds. However, few SWfMS provide parallel support. The ones that do so are usually labor‐intensive for workflow developers and have limited primitives to optimize workflow execution. To address these issues, we developed workflow algebra to specify and enable the optimization of parallel execution of scientific workflows. In this paper, we show how the workflow algebra is efficiently implemented in Chiron, an algebraic based parallel scientific workflow engine. Chiron has a unique native distributed provenance mechanism that enables runtime queries in a relational database. We developed two studies to evaluate the performance of our algebraic approach implemented in Chiron; the first study compares Chiron with different approaches, whereas the second one evaluates the scalability of Chiron. By analyzing the results, we conclude that Chiron is efficient in executing scientific workflows, with the benefits of declarative specification and runtime provenance support. Copyright

Concurrency and Computation: Practice and Experience | 2016

Analyzing related raw data files through dataflows

Vítor Silva; Daniel de Oliveira; Patrick Valduriez; Marta Mattoso

Computer simulations may ingest and generate high numbers of raw data files. Most of these files follow a de facto standard format established by the application domain, for example, Flexible Image Transport System for astronomy. Although these formats are supported by a variety of programming languages, libraries, and programs, analyzing thousands or millions of files requires developing specific programs. Database management systems (DBMS) are not suited for this, because they require loading the raw data and structuring it, which becomes heavy at large scale. Systems like NoDB, RAW, and FastBit have been proposed to index and query raw data files without the overhead of using a database management system. However, these solutions are focused on analyzing one single large file instead of several related files. In this case, when related files are produced and required for analysis, the relationship among elements within file contents must be managed manually, with specific programs to access raw data. Thus, this data management may be time‐consuming and error‐prone. When computer simulations are managed by a scientific workflow management system (SWfMS), they can take advantage of provenance data to relate and analyze raw data files produced during workflow execution. However, SWfMS registers provenance at a coarse grain, with limited analysis on elements from raw data files. When the SWfMS is dataflow‐aware, it can register provenance data and the relationships among elements of raw data files altogether in a database, which is useful to access the contents of a large number of files. In this paper, we propose a dataflow approach for analyzing element data from several related raw data files. Our approach is complementary to the existing single raw data file analysis approaches. We use the Montage workflow from astronomy and a workflow from Oil and Gas domain as data‐intensive case studies. Our experimental results for the Montage workflow explore different types of raw data flows like showing all linear transformations involved in projection simulation programs, considering specific mosaic elements from input repositories. The cost for raw data extraction is approximately 3.7% of the total application execution time. Copyright

international conference on management of data | 2012

Evaluating parameter sweep workflows in high performance computing

Fernando Chirigati; Vítor Silva; Eduardo S. Ogasawara; Daniel de Oliveira; Jonas Dias; Fábio Porto; Patrick Valduriez; Marta Mattoso

Scientific experiments based on computer simulations can be defined, executed and monitored using Scientific Workflow Management Systems (SWfMS). Several SWfMS are available, each with a different goal and a different engine. Due to the exploratory analysis, scientists need to run parameter sweep (PS) workflows, which are workflows that are invoked repeatedly using different input data. These workflows generate a large amount of tasks that are submitted to High Performance Computing (HPC) environments. Different execution models for a workflow may have significant differences in performance in HPC. However, selecting the best execution model for a given workflow is difficult due to the existence of many characteristics of the workflow that may affect the parallel execution. We developed a study to show performance impacts of using different execution models in running PS workflows in HPC. Our study contributes by presenting a characterization of PS workflow patterns (the basis for many existing scientific workflows) and its behavior under different execution models in HPC. We evaluated four execution models to run workflows in parallel. Our study measures the performance behavior of small, large and complex workflows among the evaluated execution models. The results can be used as a guideline to select the best model for a given scientific workflow execution in HPC. Our evaluation may also serve as a basis for workflow designers to analyze the expected behavior of an HPC workflow engine based on the characteristics of PS workflows.

international provenance and annotation workshop | 2010

GExpLine: A Tool for Supporting Experiment Composition

Daniel de Oliveira; Eduardo S. Ogasawara; Fernando Seabra; Vítor Silva; Leonardo Murta; Marta Mattoso

Scientific experiments present several advantages when modeled at high abstraction levels, independent from Scientific Workflow Management System (SWfMS) specification languages. For example, the scientist can define the scientific hypothesis in terms of algorithms and methods. Then, this high level experiment can be mapped into different scientific workflow instances. These instances can be executed by a SWfMS and take advantage of its provenance records. However, each workflow execution is often treated by the SWfMS as independent instances. There are no tools that allow modeling the conceptual experiment and linking it to the diverse workflow execution instances. This work presents GExpLine, a tool for supporting experiment composition through provenance. In an analogy to software development, it can be seen as a CASE tool while a SWfMS can be seen as an IDE. It provides a conceptual representation of the scientific experiment and automatically associates workflow executions with the concept of experiment. By using prospective provenance from the experiment, GExpLine generates corresponding workflows that can be executed by SWfMS. This paper also presents a real experiment use case that reinforces the importance of GExpLine and its prospective provenance support.

european conference on parallel processing | 2014

Scientific Workflow Partitioning in Multisite Cloud

Ji Liu; Vítor Silva; Esther Pacitti; Patrick Valduriez; Marta Mattoso

Scientific workflows allow scientists to conduct experiments that manipulate data with multiple computational activities using Scientific Workflow Management Systems SWfMSs. As the scale of the data increases, SWfMSs need to support workflow execution in High Performance Computing HPC environments. Because of various benefits, cloud emerges as an appropriate infrastructure for workflow execution. However, it is difficult to execute some scientific workflows in one cloud site because of geographical distribution of scientists, data and computing resources. Therefore, a scientific workflow often needs to be partitioned and executed in a multisite environment. Also, SWfMSs generally execute a scientific workflow in parallel within one site. This paper proposes a non-intrusive approach to execute scientific workflows in a multisite cloud with three workflow partitioning techniques. We describe an experimental validation using an adaptation of Chiron SWfMS for Microsoft Azure multisite cloud. The experiment results reveal the efficiency of our partitioning techniques, and their superiority in different environments.

Future Generation Computer Systems | 2017

Raw data queries during data-intensive parallel workflow execution

Vítor Silva; José Leite; José J. Camata; Daniel de Oliveira; Alvaro L. G. A. Coutinho; Patrick Valduriez; Marta Mattoso

Computer simulations consume and produce huge amounts of raw data files presented in different formats, e.g., HDF5 in computational fluid dynamics simulations. Users often need to analyze domain-specific data based on related data elements from multiple files during the execution of computer simulations. In a raw data analysis, one should identify regions of interest in the data space and retrieve the content of specific related raw data files. Existing solutions, such as FastBit and RAW, are limited to a single raw data file analysis and can only be used after the execution of computer simulations. Scientific Workflow Management Systems (SWMS) can manage the dataflow of computer simulations and register related raw data files at a provenance database. This paper aims to combine the advantages of a dataflow-aware SWMS and the raw data file analysis techniques to allow for queries on raw data file elements that are related, but reside in separate files. We propose a component-based architecture, named as ARMFUL (Analysis of Raw data from Multiple Files) with raw data extraction and indexing techniques, which allows for a direct access to specific elements or regions of raw data space. ARMFUL innovates by using a SWMS provenance database to add a dataflow access path to raw data files. ARMFUL facilitates the invocation of ad-hoc programs and third party tools (e.g., FastBit tool) for raw data analyses. In our experiments, a real parallel computational fluid dynamics is executed, exploring different alternatives of raw data extraction, indexing and analysis.

symposium on computer architecture and high performance computing | 2014

Exploratory Analysis of Raw Data Files through Dataflows

Vítor Silva; Daniel de Oliveira; Marta Mattoso

Scientific applications generate raw data files in very large scale. Most of these files follow a standard format established by the domain area application, like HDF5, Net CDF and FITS. These formats are supported by a variety of programming languages, libraries and programs. Since they are in large scale, analyzing these files require writing a specific program. Generic data analysis systems like database management systems (DBMS) are not suited because of data loading and data transformation in large scale. Recently there have been several proposals for indexing and querying raw data files without the overhead of using a DBMS, such as noDB, RAW and Fast Bit. Their goal is to offer query support to the raw data file after a scientific program has generated it. However, these solutions are focused on the analysis of one single large file. When a large number of files are all related and required to the evaluation of one scientific hypothesis, the relationships must be managed manually or by writing specific programs. The proposed approach takes advantage of existing provenance data support from Scientific Workflow Management Systems (SWfMS). When scientific applications are managed by SWfMS, the data is registered along the provenance database at runtime. Therefore, this provenance data may act as a description of theses files. When the SWfMS is dataflow aware, it registers domain data all in the same database. This resulting database becomes an important access method to the large number of files that are generated by the scientific workflow execution. This becomes a complementary approach to the single raw data file analysis support. In this work, we present our dataflow approach for analyzing data from several raw data files and evaluate it with the Montage application from the astronomy domain.

2014 Brazilian Symposium on Computer Networks and Distributed Systems | 2014

A Public Transportation Monitoring System Using IEEE 802.11 Networks

Vítor Silva; Tatiana Sciammarella; Miguel Elias M. Campista; Luís Henrique Maciel Kosmalski Costa

One major challenge today is to deal with the traffic jams related to excessive use of vehicles. It is believed that more people would adopt public transportation if these were more reliable. Thus, this paper proposes WiBus, which is a system to estimate buses times of arrival, based on information from opportunistic contacts in IEEE 802.11 networks. Estimates are provided to users through graphical interfaces of mobile devices. WiBus adjusts bus trajectories with an algorithm for dynamic creation and maintenance. The system was implemented and its performance was analyzed via emulation of a real scenario. Experimental results show that WiBus can meet demands of large cities with accumulated error on the order of a few minutes in the worst case.

edbt icdt workshops | 2013

Provenance traces from Chiron parallel workflow engine

Felipe Horta; Vítor Silva; Flavio Costa; Daniel de Oliveira; Kary A. C. S. Ocaña; Eduardo S. Ogasawara; Jonas Dias; Marta Mattoso

Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are managed by Scientific Workflow Management Systems (SWfMS). The different languages used by SWfMS may impact in the way the workflow engine executes the workflow, sometimes limiting optimization opportunities. To tackle this issue, we recently proposed a scientific workflow algebra [1]. This algebra is inspired by database relational algebra and it enables automatic optimization of scientific workflows to be executed in parallel in high performance computing (HPC) environments. This way, the experiments presented in this paper were executed in Chiron, a parallel scientific workflow engine implemented to support the scientific workflow algebra. Before executing the workflow, Chiron stores the prospective provenance [2] of the workflow on its provenance database. Each workflow is composed by several activities, and each activity consumes relations. Similarly to relational databases, a relation contains a set of attributes and it is composed by a set of tuples. Each tuple in a relation contains a series of values, each one associated to a specific attribute. The tuples of a relation are distributed to be consumed in parallel over the computing resources according to the workflow activity. During and after the execution, the retrospective provenance [2] is also stored.

Explore More