Marta Mattoso
Federal University of Rio de Janeiro
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marta Mattoso.
international conference on cloud computing | 2010
Daniel de Oliveira; Eduardo S. Ogasawara; Fernanda Araujo Baião; Marta Mattoso
Most of the large-scale scientific experiments modeled as scientific workflows produce a large amount of data and require workflow parallelism to reduce workflow execution time. Some of the existing Scientific Workflow Management Systems (SWfMS) explore parallelism techniques - such as parameter sweep and data fragmentation. In those systems, several computing resources are used to accomplish many computational tasks in homogeneous environments, such as multiprocessor machines or cluster systems. Cloud computing has become a popular high performance computing model in which (virtualized) resources are provided as services over the Web. Some scientists are starting to adopt the cloud model in scientific domains and are moving their scientific workflows (programs and data) from local environments to the cloud. Nevertheless, it is still difficult for the scientist to express a parallel computing paradigm for the workflow on the cloud. Capturing distributed provenance data at the cloud is also an issue. Existing approaches for executing scientific workflows using parallel processing are mainly focused on homogeneous environments whereas, in the cloud, the scientist has to manage new aspects such as initialization of virtualized instances, scheduling over different cloud environments, impact of data transferring and management of instance images. In this paper we propose SciCumulus, a cloud middleware that explores parameter sweep and data fragmentation parallelism in scientific workflow activities (with provenance support). It works between the SWfMS and the cloud. SciCumulus is designed considering cloud specificities. We have evaluated our approach by executing simulated experiments to analyze the overhead imposed by clouds on the workflow execution time.
grid computing | 2012
Daniel de Oliveira; Kary A. C. S. Ocaña; Fernanda Araujo Baião; Marta Mattoso
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.
International Journal of Business Process Integration and Management | 2010
Marta Mattoso; Cláudia Maria Lima Werner; Guilherme Horta Travassos; Vanessa Braganholo; Eduardo S. Ogasawara; Daniel de Oliveira; Sérgio Manuel Serra da Cruz; Wallace Martinho; Leonardo Murta
One of the main challenges of scientific experiments is to allow scientists to manage and exchange their scientific computational resources (data, programs, models, etc.). The effective management of such experiments requires a specific set of cardinal facilities, such as experiment specification techniques, workflow derivation heuristics and provenance mechanisms. These facilities may characterise the experiment life cycle into three phases: composition, execution, and analysis. Works concerned with supporting scientific workflows are mainly concerned with the execution and analysis phase. Therefore, they fail to support the scientific experiment throughout its life cycle as a set of integrated experimentation technologies. In large scale experiments this represents a research challenge. We propose an approach for managing large scale experiments based on provenance gathering during all phases of the life cycle. We foresee that such approach may aid scientists to have more control on the trials of the scientific experiment.
grid computing | 2015
Ji Liu; Esther Pacitti; Patrick Valduriez; Marta Mattoso
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.
Journal of Grid Computing | 2007
Esther Pacitti; Patrick Valduriez; Marta Mattoso
Initially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques.
symposium on software reusability | 2001
Regina M. M. Braga; Marta Mattoso; Cláudia Maria Lima Werner
Component Based Developed aims at constructing software through the inter-relationship between pre-existing components. However, these components should be bound to a specific application domain in order to be effectively reused. Reusable domain components and Their related documentation are usually stored in a great variety of data sources. Thus, a possible solution for accessing this information is to use a software layer that integrates different component information sources. We present a component information integration data layer, based on mediators. Through mediators, domain ontology acts as a technique/formalism for specifying ontological commitments or agreements between component users and providers, enabling more accurate software component information search.
edbt icdt workshops | 2013
Flavio Costa; Vítor Silva; Daniel de Oliveira; Kary A. C. S. Ocaña; Eduardo S. Ogasawara; Jonas Dias; Marta Mattoso
Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data.
Distributed and Parallel Databases | 2009
Alexandre A. B. Lima; Camille Furtado; Patrick Valduriez; Marta Mattoso
We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.
database and expert systems applications | 2000
Regina M. M. Braga; Cláudia Maria Lima Werner; Marta Mattoso
The main objective of domain engineering is to provide domain information that helps the specification of domain applications. Some applications need to reuse information from multiple domains. Currently, there are several Domain Engineering methods that provide domain information using different representations that are stored in various formats. Due to the costs involved in a domain engineering initiative, it is important to be able to access all available domain information. This paper describes a retrieval agent system that provides access to information from multiple domains, regardless of its heterogeneity or distribution. Domain ontologies and an evolutionary model of the users interests are some of the basic concepts used by the system to help users identify and retrieve relevant domain information.
Proceedings 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology. ASSET'99 (Cat. No.PR00122) | 1999
Regina M. M. Braga; Cláudia Maria Lima Werner; Marta Mattoso
This paper presents a reuse based software development environment that provides support to component-based software development (CBD) within certain domains, named Odyssey. Object-oriented frameworks, software architectures, artificial intelligence techniques, domain engineering, and mediators are some of the technologies used by Odyssey.