Daniel Cordeiro | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Cordeiro is active.

Explore More

Publication

Featured researches published by Daniel Cordeiro.

simulation tools and techniques for communications, networks and system | 2010

Random graph generation for scheduling simulations

Daniel Cordeiro; Grégory Mounié; Swann Perarnau; Denis Trystram; Jean-Marc Vincent; Frédéric Wagner

In parallel and distributed systems, validation of scheduling heuristics is usually done by simulation on randomly generated synthetic workloads, typically represented by task graphs. Since there is no single generation method that models all possible workloads for scheduling problems, researchers often re-implement the classical generation algorithms or even implement ad hoc ones. A bad choice of generation method can mislead the validation of the algorithm due to biases it can induce. Moreover, different implementations of the same randomized generation method may produce slightly different graphs. These problems can harm the experimental comparison of scheduling algorithms. In order to provide a comparison basis we propose GGen -- a unified and standard implementation of classical task graph generation methods used in the scheduling domain. We also provide an in-depth analysis of each generation method, emphasizing important graph properties that may influence scheduling algorithms.

international conference on parallel processing | 2012

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems

Laércio Lima Pilla; Christiane Pousa Ribeiro; Daniel Cordeiro; Chao Mei; Abhinav Bhatele; Philippe Olivier Alexandre Navaux; François Broquedis; Jean-François Méhaut; Laxmikant V. Kalé

Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the memory access costs within a compute node are also asymmetric. Ignoring this can lead to an increase in the data movement costs. Therefore, to fully exploit the potential of these nodes and reduce data access costs, it becomes crucial to have a complete view of the machine topology (i.e. the compute node topology and the interconnection network among the nodes). Furthermore, the parallel application behavior has an important role in determining how to utilize the machine efficiently. In this paper, we propose a hierarchical load balancing approach to improve the performance of applications on parallel multi-core systems. We introduce NucoLB, a topology-aware load balancer that focuses on redistributing work while reducing communication costs among and within compute nodes. NucoLB takes the asymmetric memory access costs present on NUMA multi-core compute nodes, the interconnection network overheads, and the application communication patterns into account in its balancing decisions. We have implemented NucoLB using the Charm++ parallel runtime system and evaluated its performance. Results show that our load balancer improves performance up to 20% when compared to state-of-the-art load balancers on three different NUMA parallel machines.

european conference on parallel processing | 2010

Analysis of multi-organization scheduling algorithms

Johanne Cohen; Daniel Cordeiro; Denis Trystram; Frédéric Wagner

In this paper we consider the problem of scheduling on computing platforms composed of several independent organizations, known as the Multi-Organization Scheduling Problem (MOSP). Each organization provides both resources and tasks and follows its own objectives. We are interested in the best way to minimize the makespan on the entire platform when the organizations behave in a selfish way. We study the complexity of the MOSP problem with two different local objectives - makespan and average completion time - and show that MOSP is NP-Hard in both cases. We formally define a selfishness notion, by means of restrictions on the schedules. We prove that selfish behavior imposes a lower bound of 2 on the approximation ratio for the global makespan. We present various approximation algorithms of ratio 2 which validate selfishness restrictions. These algorithms are experimentally evaluated through simulation, exhibiting good average performances.

network computing and applications | 2014

Deploying Large-Scale Service Compositions on the Cloud with the CHOReOS Enactment Engine

Leonardo Leite; Carlos Eduardo Moreira; Daniel Cordeiro; Marco Aurélio Gerosa; Fabio Kon

In recent years, service-oriented systems are becoming increasingly complex, with growing size and heterogeneity. Developing and deploying such large-scale systems present several challenges, such as reliability, reproducibility, handling failures on infrastructure, scaling deployment time as composition size grows, coordinating deployment among multiple organizations, dependency management, and supporting requirements of adaptable systems. However, many organizations still rely on manual deployment processes, which imposes difficulties in overcoming such challenges. In this paper, we propose a flexible and extensible middleware solution that addresses the challenges present in the large-scale deployment of service compositions. The CHOReOS Enactment Engine is a robust middleware infrastructure to automate the deployment of large-scale service compositions. We describe the middleware architecture and implementation and then present experimental results demonstrating the feasibility of our approach.

ieee international conference on high performance computing, data, and analytics | 2011

Coordination mechanisms for selfish multi-organization scheduling

Johanne Cohen; Daniel Cordeiro; Denis Trystram; Frédéric Wagner

We conduct a game theoretic analysis on the problem of scheduling jobs on computing platforms composed of several independent and selfish organizations, known as the Multi-Organization Scheduling Problem (MOSP). Each organization shares resources and jobs with others, expecting to decrease the makespan of its own jobs. We modeled MOSP as a non-cooperative game where each agent is responsible for assigning all jobs belonging to a particular organization to the available processors. The local scheduling of these jobs is defined by coordination mechanisms that first prioritize local jobs and then schedule the jobs from others according to some given priority. When different priorities are given individually to the jobs — like in classical scheduling algorithms such as LPT or SPT — then no pure e-approximate equilibrium is possible for values of e less than 2. We also prove that even deciding whether a given instance admits or not a pure Nash equilibrium is co-NP hard. When these priorities are given to entire organizations, we show the existence of an algorithm that always computes a pure p-approximate equilibrium using any p-approximation list scheduling algorithm. Finally, we prove that the price of anarchy of the MOSP game using this mechanism is asymptotically bounded by 2.

international parallel and distributed processing symposium | 2011

Tight Analysis of Relaxed Multi-organization Scheduling Algorithms

Daniel Cordeiro; Pierre-François Dutot; Grégory Mounié; Denis Trystram

The goal of this paper is to study how limited cooperation can impact the quality of the schedule obtained by multiple independent organizations in a typical grid computing platform. This relaxed version of the problem known as the Multi-Organization Scheduling Problem (MOSP) models an environment where organizations providing both resources and jobs tolerate a bounded degradation on the make span of their own jobs in order to minimize the make span over the entire platform. More precisely, the technical contributions are the following. First, we improve the existing in approximation bounds for this problem proving that what was previously though as not polynomially approximable ({\it unless

european conference on parallel processing | 2007

Load balancing on an interactive multiplayer game server

Daniel Cordeiro; Alfredo Goldman; Dilma Da Silva

P=NP

ieee international conference on high performance computing data and analytics | 2015

A Simple BSP-based Model to Predict Execution Time in GPU Applications

Marcos Amaris; Daniel Cordeiro; Alfredo Goldman; Raphael Y. de Camargo

}) is actually not approximable at all. We achieve this using two families of instances whose Pareto optimal solutions are on par with the previous in aproximability bounds. Then, we present two algorithms that solve the problem with approximation ratios of (2, 3/2) and (3, 4/3) respectively. This means that when using the first (second) algorithm, if an organization tolerates that the completion time of its last job cannot exceed twice (three times) the time it would have obtained by itself, then the algorithm provides a solution that is a 3/2-approximation (4/3-approximation) for the optimal global make span. Both algorithms are efficient since their performance ratio correspond to the Pareto optimal solutions of the previously defined instances.

Concurrency and Computation: Practice and Experience | 2015

Coordination mechanisms for decentralized parallel systems

Johanne Cohen; Daniel Cordeiro; Denis Trystram

In this work, we investigate the impact of issues related to performance, parallelization, and scalability of interactive, multiplayer games. Particularly, we study and extend the game QuakeWorld, made publicly available by id Software under GPL license. We have created a new parallelization model for Quakes distributed simulation and implemented this model in QuakeWorld server. We implemented the model adapting the QuakeWorld server in order to allow a better management of the generated workload. We present in this paper our experimental results on SMP computers.

international conference on cluster computing | 2015

PLB-HeC: A Profile-Based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters

Luis Sant'Ana; Daniel Cordeiro; Raphael Y. de Camargo

Models are useful to represent abstractions of software and hardware processes. The Bulk Synchronous Parallel (BSP) is a bridging model for parallel computation that allows algorithmic analysis of programs on parallel computers using performance modeling. The main idea of BSP model is the treatment of communication and computation as abstractions of a parallel system. Meanwhile, the use of GPU devices are becoming more widespread and they are currently capable of performing efficient parallel computation for applications that can be decomposed on thousands of simple threads. However, few models for predicting application execution time on GPUs have been proposed. In this work we present a simple and intuitive BSP-based model for predicting the CUDA application execution times on GPUs. The model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. Scalability, divergence, effect of optimizations and differences of architectures are adjusted by a single parameter. We evaluated our model using two applications and six different boards. We showed by using profile information for a single board, that the model is general enough to predict the execution time of an application with different input sizes and on different boards with the same architecture. Our model predictions were within 0.8 to 1.2 times the measured execution times, which are reasonable for such a simple model. These results indicate that the model is good enough to generalize the predictions for different problem sizes and GPU configurations.

Explore More