Aline de P. Nascimento
Federal Fluminense University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aline de P. Nascimento.
Concurrency and Computation: Practice and Experience | 2007
Aline de P. Nascimento; Alexandre C. Sena; Cristina Boeres; Vinod E. F. Rebello
The execution of distributed applications on the Grid is already a reality. However, as both the number of applications grow and Grids increase in scale, the efficient utilization of the available but shared heterogeneous resources will become increasingly essential to the Grids successful maturity. Furthermore, it is unclear whether existing Grid management systems are capable of meeting this challenge. The EasyGrid middleware is a hierarchically distributed application management system (AMS) that is embedded into MPI applications to autonomously orchestrate their execution efficiently in computational Grids. The overhead of employing a distinct AMS to make each application system aware brings at least two benefits. First, the adopted policies can be tailored to the specific needs of each application, leading to improved performance. Second, distributing the management effort among executing applications makes Grid management more scalable. This article focuses on scheduling policies of an AMS for a particular class of application, describing a low intrusion implementation of a hybrid scheduling strategy designed to elicit good performance even in dynamic environments such as Grids. Using application‐specific scheduling policies, near‐optimal runtimes highlight the advantages of self‐scheduling when executing one or more system aware applications on a Grid. Copyright
middleware for grid computing | 2008
Aline de P. Nascimento; Cristina Boeres; Vinod E. F. Rebello
As grids are in essence heterogeneous, dynamic, shared and distributed environments, managing these kinds of platforms efficiently is extremely complex. Few transparent grid management systems have been developed to cope with these characteristics simultaneously and therefore both new and existing applications must be modified to execute efficiently. A promising scalable approach to deal with these intricacies is the design of self-managing or autonomic applications. Autonomic applications adapt their execution accordingly by considering knowledge about their own behaviour and environmental conditions. This paper focuses on the dynamic scheduling that provides the self-optimizing ability in autonomic applications. Being distributed, collaborative and pro-active, the proposed hierarchical scheduling infrastructure addresses important issues to enable an efficient execution in a computational grid. Unlike other approaches, the cooperative, hybrid and application-specific strategy deals effectively with task dependencies. Several experiments have been analyzed in real grid environments highlighting the efficiency and scalability of the proposed infrastructure. This paper presents an intra-site dynamic scheduling heuristic for tightly coupled parallel applications represented by DAGs.
cluster computing and the grid | 2007
Alexandre C. Sena; Aline de P. Nascimento; J.A. da Silva; Daniela Vianna; Cristina Boeres; Vinod E. F. Rebello
The MPI message passing library is used extensively in the scientific community as a tool for parallel programming. Even though improvements have been made to existing implementations to support execution on computational grids, MPI was initially designed to deal with homogeneous, fault- free, static environments such as computing clusters. The typical programming approach is to execute a single MPI process on each resource. However, this may not be appropriate for heterogeneous, non-dedicated and dynamic environments such as grids. This paper aims to show that programmers can implement parallel MPI solutions to their problems in an architectural independent style and obtain good performance on a grid by transferring responsibility to an application management system (AMS). A comparison of program implementations under a traditional MPI execution model and a fine-grain model highlight the advantages of using the latter.
international symposium on parallel and distributed processing and applications | 2008
Alexandre C. Sena; Aline de P. Nascimento; Cristina Boeres; Vinod E. F. Rebello
This paper addresses the challenge of how to permit tightly coupled parallel applications, optimised for uniform, stable, static environments, execute equally efficiently in environments which exhibit the complete opposite characteristics. Using the N-body problem as a case study, both the traditional and proposed grid enabled MPI implementations of the popular ring algorithm are analysed. Results with respect to performance show the latter approach to be competitive on a cluster and significantly more effective in heterogeneous and dynamic environments.
International Journal of Foundations of Computer Science | 1999
Cristina Boeres; Aline de P. Nascimento; Vinod E. F. Rebello
While the task scheduling problem under the delay model has been studied extensively, relatively little research exists for more realistic communication models such as the LogP model which considers, in addition to latency, the cost of sending and receiving messages, and the network or link capacity. The task scheduling problem is known to be NP-complete even under the delay model (a special case of the LogP model). This paper investigates the similarities and differences between task-clustering algorithms for the delay and LogP models, and describes task-scheduling algorithm for the allocation of arbitrary task graphs to fully connected networks of processors under the LogP model. The strategy exploits the replication and clustering of tasks to minimize the ill effects of communication overhead on the makespan. A number of restrictions are presented which are used to simplify the design of the new algorithm. The quality of the schedules produced by the algorithm compare favorably with two well-known delay model-based algorithms and a previously existing LogP strategy.
symposium on computer architecture and high performance computing | 2013
Felipe S. Ribeiro; Aline de P. Nascimento; Cristina Boeres; Vinod E. F. Rebello; Alexandre C. Sena
During their execution, a significant number of applications often sub utilize the capacity of the resources to which they are allocated or require more. Furthermore, with the current scale up trend in server design, effective utilization can only be achieved by applications sharing such resources. Cluster management systems already support static resource partitioning at job submission time and given that application utilization more than often varies during the execution, it will become increasingly more important to permit applications to harness all available spare capacity. This paper investigates the feasibility of malleable evolving versions of applications to improve performance and system efficiency. Extending a previous classification, we show that improvements can be achieved for a real astrophysics application.
european conference on parallel processing | 1999
Cristina Boeres; Aline de P. Nascimento; Vinod E. F. Rebello
While the problem of scheduling weighted arbitrary DAGs under the delay model has been studied extensively, comparatively little work exists for this problem under a more realistic model such as LogP. This paper investigates the similarities and differences between task clustering algorithms for the delay and LogP models. The principles behind three new algorithms for tackling the scheduling problem under the LogP model are described. The quality of the schedules produced by the algorithms are compared with good delay model-based algorithms and a previously existing LogP strategy.
ieee international conference on escience | 2011
Alexandre C. Sena; Aline de P. Nascimento; Cristina Boeres; Vinod E. F. Rebello; André Bulcão
The new oil fields discovered in the Gulf of Mexico and in Brazils southeast coast are located in deep water, which imposes new challenges for sub salt seismic imaging. To produce sufficient accurate imaging of such fields, the compute intensive RTM method is the currently favoured approach, despite its high computational cost. This work evaluates the RTM code in multicore machines and proposes a new version that is more than 2.4 faster than the original. Besides, how the memory hierarchy is utilised has a crucial impact on the performance of the code. This article also presents an innovative approach to calculate efficient block values for a better memory utilisation on multicore archictetures.
ieee international conference on high performance computing data and analytics | 2008
Aline de P. Nascimento; Alexandre C. Sena; Jacques Alves da Silva; Daniela Vianna; Cristina Boeres; Vinod E. F. Rebello
Computational grids aim to aggregate significant numbers of resources to provide sufficient, but low cost, computational power to various applications. Writing applications capable of executing efficiently in grids, is however extremely difficult. Their geographically distributed resources are typically heterogeneous, non-dedicated, and offer no performance or availability guarantees. This makes the collective management of resources and application both complex and arduous. This work investigates an alternative approach (based on system-awareness) to solve the problem of developing and managing the execution of grid applications efficiently. Results show that these system-aware applications are indeed faster than their conventional implementations and easily grid enabled.
symposium on computer architecture and high performance computing | 2016
Aline de P. Nascimento; C. N. Vasconcelos; F. S. Jamel; Alexandre C. Sena
The bipartite graph matching problem is based on finding a point that maximizes the chances of similarity with another one, and it is explored in several areas such as Bioinformatics and Computer Vision. To solve that matching problem the auction algorithm has been widely used and its parallel implementation is employed to find matching solutions in a reasonable computational time. For example, image analysis may require a large amount of processing, as dense images can have thousands of points to be considered. Furthermore, to exploit the benefits of multicore architectures, a hybrid implementation can be used to deal with communication in both distributed and shared memory. The main goal of this paper is to implement and evaluate the performance of an hybrid parallel auction algorithm for multicore clusters. The experiments carried out analyzes the problem size, the number of iterations to solve the matching and the impact of these parameters in the communication costs and how it affects the execution times.