Andréa M. Matsunaga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andréa M. Matsunaga is active.

Explore More

Publication

Featured researches published by Andréa M. Matsunaga.

ieee international conference on escience | 2008

CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

Andréa M. Matsunaga; Maurício O. Tsugawa; José A. B. Fortes

This paper proposes and evaluates an approach to the parallelization, deployment and management of bioinformatics applications that integrates several emerging technologies for distributed computing. The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and network virtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment. An implementation of this approach is described and used to demonstrate and evaluate the proposed approach. The implementation integrates Hadoop, Virtual Workspaces, and ViNe as the MapReduce, virtual machine and virtual network technologies, respectively, to deploy the commonly used bioinformatics tool NCBI BLAST on a WAN-based test bed consisting of clusters at two distinct locations, the University of Florida and the University of Chicago. This WAN-based implementation, called CloudBLAST, was evaluated against both non-virtualized and LAN-based implementations in order to assess the overheads of machine and network virtualization, which were shown to be insignificant. To compare the proposed approach against an MPI-based solution, CloudBLAST performance was experimentally contrasted against the publicly available mpiBLAST on the same WAN-based test bed. Both versions demonstrated performance gains as the number of available processors increased, with CloudBLAST delivering speedups of 57 against 52.4 of MPI version, when 64 processors on 2 sites were used. The results encourage the use of the proposed approach for the execution of large-scale bioinformatics applications on emerging distributed environments that provide access to computing resources as a service.

IEEE Internet Computing | 2009

Sky Computing

Katarzyna Keahey; Maurício O. Tsugawa; Andréa M. Matsunaga; José A. B. Fortes

Infrastructure-as-a-service (IaaS) cloud computing is revolutionizing how we approach computing. Compute resource consumers can eliminate the expense inherent in acquiring, managing, and operating IT infrastructure and instead lease resources on a pay-as-you-go basis. IT infrastructure providers can exploit economies of scale to mitigate the cost of buying and operating resources and avoid the complexity required to manage multiple customer-specific environments and applications. The authors describe the context in which cloud computing arose, discuss its current strengths and shortcomings, and point to an emerging computing pattern it enables that they call sky computing.

Future Generation Computer Systems | 2005

From virtualized resources to virtual computing grids: the In-VIGO system

Sumalatha Adabala; Vineet Chadha; Puneet Chawla; Renato J. O. Figueiredo; José A. B. Fortes; Ivan Krsul; Andréa M. Matsunaga; Maurício O. Tsugawa; Jian Zhang; Ming Zhao; Liping Zhu; Xiaomin Zhu

This paper describes the architecture of the first implementation of the In-VIGO grid-computing system. The architecture is designed to support computational tools for engineering and science research In Virtual Information Grid Organizations (as opposed to in vivo or in vitro experimental research). A novel aspect of In-VIGO is the extensive use of virtualization technology, emerging standards for grid-computing and other Internet middleware. In the context of In-VIGO, virtualization denotes the ability of resources to support multiplexing, manifolding and polymorphism (i.e. to simultaneously appear as multiple resources with possibly different functionalities). Virtualization technologies are available or emerging for all the resources needed to construct virtual grids which would ideally inherit the above mentioned properties. In particular, these technologies enable the creation of dynamic pools of virtual resources that can be aggregated on-demand for application-specific user-specific grid-computing. This change in paradigm from building grids out of physical resources to constructing virtual grids has many advantages but also requires new thinking on how to architect, manage and optimize the necessary middleware. This paper reviews the motivation for In-VIGO approach, discusses the technologies used, describes an early architecture for In-VIGO that represents a first step towards the end goal of building virtual information grids, and reports on first experiences with the In-VIGO software under development.

grid computing | 2010

On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications

Andréa M. Matsunaga; José A. B. Fortes

Most data centers, clouds and grids consist of multiple generations of computing systems, each with different performance profiles, posing a challenge to job schedulers in achieving the best usage of the infrastructure. A useful piece of information for scheduling jobs, typically not available, is the extent to which applications will use available resources once they are executed. This paper comparatively assesses the suitability of several machine learning techniques for predicting spatio temporal utilization of resources by applications. Modern machine learning techniques able to handle large number of attributes are used, taking into account application- and system-specific attributes (e.g., CPU micro architecture, size and speed of memory and storage, input data characteristics and input parameters). The work also extends an existing classification tree algorithm, called Predicting Query Runtime (PQR), to the regression problem by allowing the leaves of the tree to select the best regression method for each collection of data on leaves. The new method (PQR2) yields the best average percentage error, predicting execution time, memory and disk consumption for two bioinformatics applications, BLAST and RAxML, deployed on scenarios that differ in system and usage. In specific scenarios where usage is a non-linear function of system and application attributes, certain configurations of two other machine learning algorithms, Support Vector Machine and k-nearest neighbors, also yield competitive results. In addition, experiments show that the inclusion of system performance and application-specific attributes also improves the performance of machine learning algorithms investigated.

PLOS ONE | 2014

Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

Ramona L. Walls; John Deck; Robert P. Guralnick; Steve Baskauf; Reed S. Beaman; Stanley Blum; Shawn Bowers; Pier Luigi Buttigieg; Neil Davies; Dag Terje Filip Endresen; Maria A. Gandolfo; Robert Hanner; Alyssa Janning; Leonard Krishtalka; Andréa M. Matsunaga; Peter E. Midford; Norman Morrison; Éamonn Ó Tuama; Mark Schildhauer; Barry Smith; Brian J. Stucky; Andrea K. Thomer; John Wieczorek; Jamie Whitacre; John Wooley

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

electronic government | 2005

Transnational Information Sharing, Event Notification, Rule Enforcement and Process Coordination

Stanley Y. W. Su; José A. B. Fortes; T. R. Kasad; M. Patil; Andréa M. Matsunaga; Maurício O. Tsugawa; Violetta Cavalli-Sforza; Jaime G. Carbonell; Peter J. Jansen; Wayne H. Ward; Ronald A. Cole; Donald F. Towsley; Weifeng Chen; Qingfeng He; C. McSweeney; L. de Brens; J. Ventura; P. Taveras; R. Connolly; C. Ortega; B. PiÃ±eres; O. Brooks; G.A. Murillo; M. Herrera

Solutions to global problems such as disease detection and control, terrorism, immigration and border control, and illicit drug trafficking require sharing and coordinating information and collaboration among government agencies within a country and across national boundaries. This paper presents an approach to achieve information sharing, event notification, enforcement of policies, constraints, regulations, security and privacy rules, and process coordination. The proposed system, designed in collaboration with stakeholders and end users in two Latin American countries, achieves the desired capabilities by integrating a distributed query processor (DQP) that provides form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems. A prototype of the integrated transnational information system is described.

international parallel and distributed processing symposium | 2004

Single sign-on in In-VIGO: role-based access via delegation mechanisms using short-lived user identities

Sumalatha Adabala; Andréa M. Matsunaga; Maurício O. Tsugawa; Renato J. O. Figueiredo; José A. B. Fortes

Summary form only given. Single sign-on (SSO) is an essential desired feature of computational grids. Its implementation is challenging because resources cross administrative domains and are managed by heterogeneous access schemes. We present an approach for single sign-on in a deployed functioning grid called In-VIGO. The approach relies on decoupling grid user accounts from local user accounts and making use of role-based access control lists. Role-based accesses via delegation mechanisms using short-lived user identities enable In-VIGO to handle interactive applications and application-specific authentication mechanisms. This capability is not present in existing grid architectures. SSO implementations for usage scenarios in In-VIGO are described to highlight the applicability of the proposed approach. In particular, access to interactive applications with their own security mechanisms, such as VNC, and access to remote data can be achieved using proxies that delegate In-VIGO user access via short-lived user identities.

global communications conference | 2010

User-level virtual networking mechanisms to support virtual machine migration over multiple clouds

Maurício O. Tsugawa; Pierre Riteau; Andréa M. Matsunaga; José A. B. Fortes

Dynamic allocation of multiple cloud resources adapting to application needs over time can be achieved by taking advantage of wide-area VM live migration technologies. However, migration of VMs across different subnets, potentially in multiple clouds, requires networking support to keep the network state of moving VMs unchanged. Two problems make traditional solutions to machine mobility inefficient in this scenario: (1) administrative overheads due to coordination requirements between moving machines and the network infrastructure; and (2) degraded network performance of machines moved away from their “home” networks. New solutions are needed to efficiently support the migration of virtual machines over multiple cloud providers. The user-level virtual network architecture presented in this paper implements mechanisms to allow VM migration over clouds without requiring support from the physical network infrastructure, and automatically reconfiguring virtual networks to maximize the network performance of migrated virtual machines.

international conference on e-science | 2009

User-Level Virtual Network Support for Sky Computing

Maurício O. Tsugawa; Andréa M. Matsunaga; José A. B. Fortes

With the emergence of multiple cloud providers of Infrastructure-as-a-Service, it becomes possible to envision a near-future when high-performance computing users could combine services from different clouds to access huge numbers of resources. However, as more administrative privileges are exposed to end users, providers are required to deploy network security measures that present challenges to the network virtualization technologies that are needed to enable inter-cloud communication. This paper studies these challenges and proposes techniques to enable unmodified applications on resources across distinct clouds. The techniques are implemented in TinyViNe, an extension to ViNe, a virtual networking technology for distributed resources in different administrative domains. The results of evaluating TinyViNe on a WAN-based testbed across three sites are reported for a bioinformatics application (BLAST) and MPI benchmarks. The results confirm that TinyViNe enables cross-cloud computing while having little impact on application performance. TinyViNe also has auto-configuration and “download-and-run” capabilities for easy deployment by users who are not knowledgeable about networking.

Concurrency and Computation: Practice and Experience | 2007

Science gateways made easy: the In‐VIGO approach

Andréa M. Matsunaga; Maurício O. Tsugawa; Sumalatha Adabala; Renato J. O. Figueiredo; Herman Lam; José A. B. Fortes

Science gateways require the easy enabling of legacy scientific applications on computing Grids and the generation of user-friendly interfaces that hide the complexity of the Grid from the user. This paper presents the In-VIGO approach to the creation and management of science gateways. First, we discuss the virtualization of machines, networks and data to facilitate the dynamic creation of secure execution environments that meet application requirements. Then we discuss the virtualization of applications, i.e. the execution on shared resources of multiple isolated application instances with customized behavior, in the context of In-VIGO. A Virtual Application Service (VAS) architecture for automatically generating, customizing, deploying, and using virtual applications as Grid services is then described. Starting with a grammar-based description of the command-line syntax, the automated process generates the VAS description and the VAS implementation (code for application encapsulation and data binding) that is deployed and made available through a Web interface. A VAS can be customized on a per-user basis by restricting the capabilities of the original application or by adding to it features such as parameter sweeping. This is a scalable approach to the integration of scientific applications as services into Grids and can be applied to any tool with an arbitrarily complex command-line syntax. Copyright

Explore More