Murat Manguoglu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Murat Manguoglu is active.

Explore More

Publication

Featured researches published by Murat Manguoglu.

Journal of Applied Mechanics | 2009

Preconditioning Techniques for Nonsymmetric Linear Systems in the Computation of Incompressible Flows

Murat Manguoglu; Ahmed H. Sameh; Faisal Saied; Tayfun E. Tezduyar; Sunil Sathe

In this paper we present effective preconditioning techniques for solving the nonsymmetric systems that arise from the discretization of the Navier-Stokes equations. These linear systems are solved using either Krylov subspace methods or the Richardson scheme. We demonstrate the effectiveness of our techniques in handling time-accurate as well as steady-state solutions. We also compare our solvers with those published previously.

SIAM Journal on Matrix Analysis and Applications | 2008

Analysis of the Truncated SPIKE Algorithm

Carl Christian Kjelgaard Mikkelsen; Murat Manguoglu

The truncated SPIKE algorithm is a parallel solver for linear systems which are banded and strictly diagonally dominant by rows. There are machines for which the current implementation of the algorithm is faster and scales better than the corresponding solver in ScaLAPACK (PDDBTRF/PDDBTRS). In this paper we prove that the SPIKE matrix is strictly diagonally dominant by rows with a degree no less than the original matrix. We establish tight upper bounds on the decay rate of the spikes as well as the truncation error. We analyze the error of the method and present the results of some numerical experiments which show that the accuracy of the truncated SPIKE algorithm is comparable to LAPACK and ScaLAPACK.

SIAM Journal on Scientific Computing | 2010

Weighted Matrix Ordering and Parallel Banded Preconditioners for Iterative Linear System Solvers

Murat Manguoglu; Mehmet Koyutürk; Ahmed H. Sameh

The emergence of multicore architectures and highly scalable platforms motivates the development of novel algorithms and techniques that emphasize concurrency and are tolerant of deep memory hierarchies, as opposed to minimizing raw FLOP counts. While direct solvers are reliable, they are often slow and memory-intensive for large problems. Iterative solvers, on the other hand, are more efficient but, in the absence of robust preconditioners, lack reliability. While preconditioners based on incomplete factorizations (whenever they exist) are effective for many problems, their parallel scalability is generally limited. In this paper, we advocate the use of banded preconditioners instead and introduce a reordering strategy that enables their extraction. In contrast to traditional bandwidth reduction techniques, our reordering strategy takes into account the magnitude of the matrix entries, bringing the heaviest elements closer to the diagonal, thus enabling the use of banded preconditioners. When used with effective banded solvers—in our case, the Spike solver—we show that banded preconditioners (i) are more robust compared to the broad class of incomplete factorization-based preconditioners, (ii) deliver higher processor performance, resulting in faster time to solution, and (iii) scale to larger parallel configurations. We demonstrate these results experimentally on a large class of problems selected from diverse application domains.

ubiquitous computing | 2012

Geo-activity recommendations by using improved feature combination

Masoud Sattari; Murat Manguoglu; Ismail Hakki Toroslu; Panagiotis Symeonidis; Pinar Senkul; Yannis Manolopoulos

In this paper, we propose a new model to integrate additional data, which is obtained from geospatial resources other than original data set in order to improve Location/Activity recommendations. The data set that is used in this work is a GPS trajectory of some users, which is gathered over 2 years. In order to have more accurate predictions and recommendations, we present a model that injects additional information to the main data set and we aim to apply a mathematical method on the merged data. On the merged data set, singular value decomposition technique is applied to extract latent relations. Several tests have been conducted, and the results of our proposed method are compared with a similar work for the same data set.

Computers & Chemical Engineering | 2014

Adaptive discontinuous Galerkin methods for non-linear diffusion–convection–reaction equations

Murat Uzunca; Bülent Karasözen; Murat Manguoglu

Abstract In this work, we apply the adaptive discontinuous Galerkin (DGAFEM) method to the convection dominated non-linear, quasi-stationary diffusion convection reaction equations. We propose an efficient preconditioner using a matrix reordering scheme to solve the sparse linear systems iteratively arising from the discretized non-linear equations. Numerical examples demonstrate effectiveness of the DGAFEM to damp the spurious oscillations and resolve well the sharp layers occurring in convection dominated non-linear equations.

ieee international conference on high performance computing data and analytics | 2010

TRACEMIN-Fiedler: a parallel algorithm for computing the Fiedler vector

Murat Manguoglu; Eric Cox; Faisal Saied; Ahmed H. Sameh

The eigenvector corresponding to the second smallest eigenvalue of the Laplacian of a graph, known as the Fiedler vector, has a number of applications in areas that include matrix reordering, graph partitioning, protein analysis, data mining, machine learning, and web search. The computation of the Fiedler vector has been regarded as an expensive process as it involves solving a large eigenvalue problem. We present a novel and efficient parallel algorithm for computing the Fiedler vector of large graphs based on the Trace Minimization algorithm. We compare the parallel performance of our method with a multilevel scheme, designed specifically for computing the Fiedler vector, which is implemented in routine MC73 FIEDLER of the Harwell Subroutine Library (HSL).

2009 Computational Electromagnetics International Workshop | 2009

A parallel hybrid sparse linear system solver

Murat Manguoglu

We present a parallel hybrid sparse linear system solver that is suitable for the solution of large sparse linear systems on parallel computing platforms. This study is motivated by the lack of robustness of Krylov subspace iterative schemes with “black-box” preconditioners, such as incomplete LU-factorizations and the lack of scalability of direct sparse system solvers. Our hybrid solver is as robust as direct solvers and as scalable as iterative solvers. Our method relies on weighted symmetric and nonsymmetric matrix reordering for bringing the largest elements on or closer to the main diagonal resulting in a very effective extracted banded preconditioner. Systems involving the extracted banded preconditioner are solved via a member of the recently developed SPIKE family of algorithms. The effectiveness of our method is demonstrated by solving large sparse linear systems that arise in various applications such as computational electromagnetics and nonlinear optimizations. We compare the performance and scalability of our solvers to well known direct and iterative solver packages such as ILUPACK and MUMPS. Finally, we present a highly accurate model for predicting the parallel scalability of our solver on architectures with more nodes than the platform on which our experiments have been performed.

languages and compilers for parallel computing | 2010

A parallel numerical solver using hierarchically tiled arrays

James C. Brodman; G. Carl Evans; Murat Manguoglu; Ahmed H. Sameh; María Jesús Garzarán; David A. Padua

Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this paper we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.

Scientific Programming | 2011

Performance models for the Spike banded linear system solver

Murat Manguoglu; Faisal Saied; Ahmed H. Sameh

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners, compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model - based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters - platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.

High-Performance Scientific Computing | 2012

Parallel Solution of Sparse Linear Systems

Murat Manguoglu

Many simulations in science and engineering give rise to sparse linear systems of equations. It is a well known fact that the cost of the simulation process is almost always governed by the solution of the linear systems especially for large-scale problems. The emergence of extreme-scale parallel platforms, along with the increasing number of processing cores available on a single chip pose significant challenges for algorithm development. Machines with tens of thousands of multicore processors place tremendous constraints on the communication as well as memory access requirements of algorithms. The increase in number of cores in a processing unit without an increase in memory bandwidth aggravates an already significant memory bottleneck. Sparse linear algebra kernels are well-known for their poor processor utilization. This is a result of limited memory reuse, which renders data caching less effective. In view of emerging hardware trends, it is necessary to develop algorithms that strike a more meaningful balance between memory accesses, communication, and computation. Specifically, an algorithm that performs more floating point operations at the expense of reduced memory accesses and communication is likely to yield better performance. We present two alternative variations of DS factorization based methods for solution of sparse linear systems on parallel computing platforms. Performance comparisons to traditional LU factorization based parallel solvers are also discussed. We show that combining iterative methods with direct solvers and using DS factorization, one can achieve better scalability and shorter time to solution.

Explore More