Gabriel Tanase | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gabriel Tanase is active.

Explore More

Publication

Featured researches published by Gabriel Tanase.

international conference on supercomputing | 2012

Composable, non-blocking collective operations on power7 IH

Gabriel Tanase; Gheorghe Almasi; Hanhong Xue; Charles J. Archer

The Power7 IH (P7IH) is one of IBMs latest generation of supercomputers. Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes. System software is tuned to exploit the hierarchical organization of the machine.n In this paper we present a novel set of collective operations that take advantage of the P7IH hardware. We discuss non blocking collective operations implemented using point to point messages, shared memory and accelerator hardware. We show how collectives can be composed to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations. We demonstrate the scalability of the collectives we designed by including experimental results on a P7IH system with up to 4096 cores.

international conference on supercomputing | 2013

Improving performance of all-to-all communication through loop scheduling in PGAS environments

Michail Alvanos; Gabriel Tanase; Montse Farreras; Ettore Tiotto; José Nelson Amaral; Xavier Martorell

Michail Alvanos∓ Programming Models Barcelona Supercomputing Center [email protected] Gabriel Tanase IBM TJ Watson Research Center Yorktown Heights, NY, US [email protected] Montse Farreras Dep. of Computer Architecture Universitat Politecnica de Catalunya [email protected] Ettore Tiotto Static Compilation Technology IBM Toronto Laboratory [email protected] Jose Nelson Amaral Dep. of Computing Science University of Alberta [email protected] Xavier Martorell Dep. of Computer Architecture Universitat Politecnica de Catalunya [email protected]

computing frontiers | 2016

Graph programming interface (GPI): a linear algebra programming model for large scale graph computations

Kattamuri Ekanadham; William P. Horn; Manoj Kumar; Joefon Jann; José E. Moreira; Pratap Pattnaik; Mauricio J. Serrano; Gabriel Tanase; Hao Yu

Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph analytics have been proposed in the literature to help users write efficient large scale graph algorithms. In this paper we present an alternative model for implementing graph algorithms using a linear algebra based specification. We first specify a set of linear algebra primitives that allows users to express graph algorithms by composition of linear algebra operations. We then describe a high performance implementation of these primitives and its integration with the Spark framework to achieve the scalability we need for large shared-memory systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.

languages and compilers for parallel computing | 2014

Memory Management Techniques for Exploiting RDMA in PGAS Languages

Barnaby Dalton; Gabriel Tanase; Michail Alvanos; Gheorghe Almasi; Ettore Tiotto

Partitioned Global Address Space (PGAS) languages are a popular alternative when building applications to run on large scale parallel machines. Unified Parallel C (UPC) is a well known PGAS language that is available on most high performance computing systems. Good performance of UPC applications is often one important requirement for a system acquisition. This paper presents the memory management techniques employed by the IBM XL UPC compiler to achieve optimal performance on systems with Remote Direct Memory Access (RDMA). Additionally we describe a novel technique employed by the UPC runtime for transforming remote memory accesses on a same shared memory node into local memory accesses, to further improve performance. We evaluate the proposed memory allocation policies for various UPC benchmarks and using the IBM® Power® 775 supercomputer [1].

symposium on computer architecture and high performance computing | 2012

Network Endpoints for Clusters of SMPs

Gabriel Tanase; Gheorghe Almasi; Hanhong Xue; Charles J. Archer

Modern large scale parallel machines feature an increasingly deep hierarchy of interconnections. Individual processing cores employ simultaneous multithreading (SMT) to better exploit functional units, multiple coherent processors are collocated in a node to better exploit links to cache, memory and network (SMP), and multiple nodes are interconnected by specialized low latency/high speed networks. Current trends indicate ever wider SMP nodes in the future. To service these nodes, modern high performance network devices (including Infiniband and all of IBMs recent offerings) offer the ability to sub-divide the network devices resources among the processing threads. System software, however, lags in exploiting these capabilities, leaving users of e.g., MPI[14], UPC[19] in a bind, requiring complex and fragile workarounds in user programs. In this paper we discuss our implementation of endpoints, the software paradigm central to the IBM PAMI messaging library [3]. A PAMI endpoint is an expression in software of a slice of the network device. System software can service endpoints without serializing the many threads on an SMP by forcing them through a critical section. In the paper we describe the basic guarantees offered by PAMI to the programmer, and how these can be used to enable efficient implementations of high level libraries and programming languages like UPC. We evaluate the efficiency of our implementation on a novel P7IHsystem with up to 4096 cores, running micro benchmarks designed to find performance deficiencies in the endpoints implementation of both point-to-point and collective functions.

International Journal of Parallel Programming | 2018

Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations

William P. Horn; Manoj Kumar; Joefon Jann; José E. Moreira; Pratap Pattnaik; Mauricio J. Serrano; Gabriel Tanase; Hao Yu

ieee high performance extreme computing conference | 2016

Efficient implementation of scatter-gather operations for large scale graph analytics

Manoj Kumar; Mauricio J. Serrano; José E. Moreira; Pratap Pattnaik; William P. Horn; Joefon Jann; Gabriel Tanase

international parallel and distributed processing symposium | 2017

A Linear Algebra-Based Programming Interface for Graph Computations in Scala and Spark

William P. Horn; Gabriel Tanase; Hao Yu; Pratap Pattnaik

arXiv: Databases | 2018

System G Distributed Graph Database.

Gabriel Tanase; Toyotaro Suzumura; Jinho Lee; Chun-Fu Chen; Jason Crawford; Hiroki Kanezashi; Song Zhang; Warut D.Vijitbenjaronk

international conference on big data | 2017

Scalable time-versioning support for property graph databases

Warut D.Vijitbenjaronk; Jinho Lee; Toyotaro Suzumura; Gabriel Tanase

++ and subsequently its integration with the Spark framework to achieve the scalability we need for large systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.

Explore More