Is this you? Create Your Porfile

Bora Uçar

École normale supérieure de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bora Uçar is active.

Explore More

Publication

Featured researches published by Bora Uçar.

Journal of Parallel and Distributed Computing | 2006

Task assignment in heterogeneous computing systems

Bora Uçar; Cevdet Aykanat; Kamer Kaya; Murat Ikinci

The problem of task assignment in heterogeneous computing systems has been studied for many years with many variations. We consider the version in which communicating tasks are to be assigned to heterogeneous processors with identical communication links to minimize the sum of the total execution and communication costs. Our contributions are three fold: a task clustering method which takes the execution times of the tasks into account; two metrics to determine the order in which tasks are assigned to the processors; a refinement heuristic which improves a given assignment. We use these three methods to obtain a family of task assignment algorithms including multilevel ones that apply clustering and refinement heuristics repeatedly. We have implemented eight existing algorithms to test the proposed methods. Our refinement algorithm improves the solutions of the existing algorithms by up to 15% and the proposed algorithms obtain better solutions than these refined solutions.

SIAM Journal on Scientific Computing | 2010

On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe

Cevdet Aykanat; Bora Uçar

We consider two-dimensional partitioning of general sparse matrices for parallel sparse matrix-vector multiply operation. We present three hypergraph-partitioning-based methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces fine-grain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed two-dimensional partitioning methods together with the hypergraph-based one-dimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics.

SIAM Journal on Scientific Computing | 2004

Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies

Bora Uçar; Cevdet Aykanat

This paper addresses the problem of one-dimensional partitioning of structurally unsymmetric square and rectangular sparse matrices for parallel matrix-vector and matrix-transpose-vector multiplies. The objective is to minimize the communication cost while maintaining the balance on computational loads of processors. Most of the existing partitioning models consider only the total message volume hoping that minimizing this communication-cost metric is likely to reduce other metrics. However, the total message latency (start-up time) may be more important than the total message volume. Furthermore, the maximum message volume and latency handled by a single processor are also important metrics. We propose a two-phase approach that encapsulates all these four communication-cost metrics. The objective in the first phase is to minimize the total message volume while maintaining the computational-load balance. The objective in the second phase is to encapsulate the remaining three communication-cost metrics. We propose communication-hypergraph and partitioning models for the second phase. We then present several methods for partitioning communication hypergraphs. Experiments on a wide range of test matrices show that the proposed approach yields very effective partitioning results. A parallel implementation on a PC cluster verifies that the theoretical improvements shown by partitioning results hold in practice.

Journal of Parallel and Distributed Computing | 2008

Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

Cevdet Aykanat; Berkant Barla Cambazoglu; Bora Uçar

K-way hypergraph partitioning has an ever-growing use in parallelization of scientific computing applications. We claim that hypergraph partitioning with multiple constraints and fixed vertices should be implemented using direct K-way refinement, instead of the widely adopted recursive bisection paradigm. Our arguments are based on the fact that recursive-bisection-based partitioning algorithms perform considerably worse when used in the multiple constraint and fixed vertex formulations. We discuss possible reasons for this performance degradation. We describe a careful implementation of a multi-level direct K-way hypergraph partitioning algorithm, which performs better than a well-known recursive-bisection-based partitioning algorithm in hypergraph partitioning with multiple constraints and fixed vertices. We also experimentally show that the proposed algorithm is effective in standard hypergraph partitioning.

international workshop on data intensive distributed computing | 2011

Integrated data placement and task assignment for scientific workflows in clouds

Kamer Kaya; Bora Uçar

We consider the problem of optimizing the execution of data-intensive scientific workflows in the Cloud. We address the problem under the following scenario. The tasks of the workflows communicate through files; the output of a task is used by another task as an input file and if these tasks are assigned on different execution sites, a file transfer is necessary. The output files are to be stored at a site. Each execution site is to be assigned a certain percentage of the files and tasks. These percentages, called target weights, are pre-determined and reflect either user preferences or the storage capacity and computing power of the sites. The aim is to place the data files into and assign the tasks to the execution sites so as to reduce the cost associated with the file transfers, while complying with the target weights. To do this, we model the workflow as a hypergraph and with a hypergraph-partitioning-based formulation, we propose a heuristic which generates data placement and task assignment schemes simultaneously. We report simulation results on a number of real-life and synthetically generated scientific workflows. Our results show that the proposed heuristic is fast, and can find mappings and assignments which reduce file transfers, while respecting the target weights.

Siam Review | 2007

Revisiting Hypergraph Models for Sparse Matrix Partitioning

Bora Uçar; Cevdet Aykanat

We provide an exposition of hypergraph models for parallelizing sparse matrix-vector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrix-vector multiply based on one-dimensional (1D) matrix partitioning. In the elementary model, the vertices represent the data of a matrix-vector multiply, and the nets encode dependencies among the data. We then apply a recently proposed hypergraph transformation operation to devise models for 1D sparse matrix partitioning. The resulting 1D partitioning models are equivalent to the previously proposed computational hypergraph models and are not meant to be replacements for them. Nevertheless, the new models give us insights into the previous ones and help us explain a subtle requirement, known as the consistency condition, of hypergraph partitioning models. Later, we demonstrate the flexibility of the elementary model on a few 1D partitioning problems that are hard to solve using the previously proposed models. We also discuss extensions of the proposed elementary model to two-dimensional matrix partitioning.

SIAM Journal on Scientific Computing | 2012

On Computing Inverse Entries of a Sparse Matrix in an Out-of-Core Environment

Patrick R. Amestoy; Iain S. Duff; Jean-Yves L'Excellent; Yves Robert; François-Henry Rouet; Bora Uçar

The inverse of an irreducible sparse matrix is structurally full, so that it is impractical to think of computing or storing it. However, there are several applications where a subset of the entries of the inverse is required. Given a factorization of the sparse matrix held in out-of-core storage, we show how to compute such a subset e ciently, by accessing only parts of the factors. When there are many inverse entries to compute, we need to guarantee that the overall computation scheme has reasonable memory requirements, while minimizing the cost of loading the factors. This leads to a partitioning problem that we prove is NP-complete. We also show that we cannot get a close approximation to the optimal solution in polynomial time. We thus need to develop heuristic algorithms, and we propose: (i) a lower bound on the cost of an optimum solution; (ii) an exact algorithm for a particular case; (iii) two other heuristics for a more general case; and (iv) hypergraph partitioning models for the most general setting. We illustrate the performance of our algorithms in practice using the MUMPS software package on a set of real-life problems as well as some standard test matrices. We show that our techniques can improve the execution time by a factor of 50. Key words. Sparse matrices, direct methods for linear systems and matrix inversion, multifrontal method, graphs and hypergraphs.

ieee international conference on high performance computing data and analytics | 2015

Scalable sparse tensor decompositions in distributed memory systems

Oguz Kaya; Bora Uçar

We investigate an efficient parallelization of the most common iterative sparse tensor decomposition algorithms on distributed memory systems. A key operation in each iteration of these algorithms is the matricized tensor times Khatri-Rao product (MTTKRP). This operation amounts to element-wise vector multiplication and reduction depending on the sparsity of the tensor. We investigate a fine and a coarse-grain task definition for this operation, and propose hypergraph partitioning-based methods for these task definitions to achieve the load balance as well as reduce the communication requirements. We also design a distributed memory sparse tensor library, HyperTensor, which implements a well-known algorithm for the CANDECOMP-/PARAFAC (CP) tensor decomposition using the task definitions and the associated partitioning methods. We use this library to test the proposed implementation of MTTKRP in CP decomposition context, and report scalability results up to 1024 MPI ranks. We observed up to 194 fold speedups using 512 MPI processes on a well-known real world data, and significantly better performance results with respect to a state of the art implementation.

international conference on parallel processing | 2011

On the use of cluster-based partial message logging to improve fault tolerance for MPI HPC applications

Thomas Ropars; Amina Guermouche; Bora Uçar; Esteban Meneses; Laxmikant V. Kalé; Franck Cappello

Fault tolerance is becoming a major concern in HPC systems. The two traditional approaches for message passing applications, coordinated checkpointing and message logging, have severe scalability issues. Coordinated checkpointing protocols make all processes roll back after a failure. Message logging protocols log a huge amount of data and can induce an overhead on communication performance. Hierarchical rollback-recovery protocols based on the combination of coordinated checkpointing and message logging are an alternative. These partial message logging protocols are based on process clustering: only messages between clusters are logged to limit the consequence of a failure to one cluster. These protocols would work efficiently only if one can find clusters of processes in the applications such that the ratio of logged messages is very low. We study the communication patterns of message passing HPC applications to show that partial message logging is suitable in most cases. We propose a partitioning algorithm to find suitable clusters of processes given the communication pattern of an application. Finally, we evaluate the efficiency of partial message logging using two state of the art protocols on a set of representative applications.

Journal of Parallel and Distributed Computing | 2007

Heuristics for scheduling file-sharing tasks on heterogeneous systems with distributed repositories

Kamer Kaya; Bora Uçar; Cevdet Aykanat

We consider the problem of scheduling an application on a computing system consisting of heterogeneous processors and data repositories. The application consists of a large number of file-sharing otherwise independent tasks. The files initially reside on the repositories. The processors and the repositories are connected through a heterogeneous interconnection network. Our aim is to assign the tasks to the processors, to schedule the file transfers from the repositories, and to schedule the executions of tasks on each processor in such a way that the turnaround time is minimized. We propose a heuristic composed of three phases: initial task assignment, task assignment refinement, and execution ordering. We experimentally compare the proposed heuristics with three well-known heuristics on a large number of problem instances. The proposed heuristic runs considerably faster than the existing heuristics and obtains 10-14% better turnaround times than the best of the three existing heuristics.

Explore More