Gabriel Tanase
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gabriel Tanase.
international conference on supercomputing | 2012
Gabriel Tanase; Gheorghe Almasi; Hanhong Xue; Charles J. Archer
The Power7 IH (P7IH) is one of IBMs latest generation of supercomputers. Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes. System software is tuned to exploit the hierarchical organization of the machine.n In this paper we present a novel set of collective operations that take advantage of the P7IH hardware. We discuss non blocking collective operations implemented using point to point messages, shared memory and accelerator hardware. We show how collectives can be composed to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations. We demonstrate the scalability of the collectives we designed by including experimental results on a P7IH system with up to 4096 cores.
international conference on supercomputing | 2013
Michail Alvanos; Gabriel Tanase; Montse Farreras; Ettore Tiotto; José Nelson Amaral; Xavier Martorell
Michail Alvanos∓ Programming Models Barcelona Supercomputing Center [email protected] Gabriel Tanase IBM TJ Watson Research Center Yorktown Heights, NY, US [email protected] Montse Farreras Dep. of Computer Architecture Universitat Politecnica de Catalunya [email protected] Ettore Tiotto Static Compilation Technology IBM Toronto Laboratory [email protected] Jose Nelson Amaral Dep. of Computing Science University of Alberta [email protected] Xavier Martorell Dep. of Computer Architecture Universitat Politecnica de Catalunya [email protected]
computing frontiers | 2016
Kattamuri Ekanadham; William P. Horn; Manoj Kumar; Joefon Jann; José E. Moreira; Pratap Pattnaik; Mauricio J. Serrano; Gabriel Tanase; Hao Yu
Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph analytics have been proposed in the literature to help users write efficient large scale graph algorithms. In this paper we present an alternative model for implementing graph algorithms using a linear algebra based specification. We first specify a set of linear algebra primitives that allows users to express graph algorithms by composition of linear algebra operations. We then describe a high performance implementation of these primitives and its integration with the Spark framework to achieve the scalability we need for large shared-memory systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.
languages and compilers for parallel computing | 2014
Barnaby Dalton; Gabriel Tanase; Michail Alvanos; Gheorghe Almasi; Ettore Tiotto
Partitioned Global Address Space (PGAS) languages are a popular alternative when building applications to run on large scale parallel machines. Unified Parallel C (UPC) is a well known PGAS language that is available on most high performance computing systems. Good performance of UPC applications is often one important requirement for a system acquisition. This paper presents the memory management techniques employed by the IBM XL UPC compiler to achieve optimal performance on systems with Remote Direct Memory Access (RDMA). Additionally we describe a novel technique employed by the UPC runtime for transforming remote memory accesses on a same shared memory node into local memory accesses, to further improve performance. We evaluate the proposed memory allocation policies for various UPC benchmarks and using the IBM® Power® 775 supercomputer [1].
symposium on computer architecture and high performance computing | 2012
Gabriel Tanase; Gheorghe Almasi; Hanhong Xue; Charles J. Archer
Modern large scale parallel machines feature an increasingly deep hierarchy of interconnections. Individual processing cores employ simultaneous multithreading (SMT) to better exploit functional units, multiple coherent processors are collocated in a node to better exploit links to cache, memory and network (SMP), and multiple nodes are interconnected by specialized low latency/high speed networks. Current trends indicate ever wider SMP nodes in the future. To service these nodes, modern high performance network devices (including Infiniband and all of IBMs recent offerings) offer the ability to sub-divide the network devices resources among the processing threads. System software, however, lags in exploiting these capabilities, leaving users of e.g., MPI[14], UPC[19] in a bind, requiring complex and fragile workarounds in user programs. In this paper we discuss our implementation of endpoints, the software paradigm central to the IBM PAMI messaging library [3]. A PAMI endpoint is an expression in software of a slice of the network device. System software can service endpoints without serializing the many threads on an SMP by forcing them through a critical section. In the paper we describe the basic guarantees offered by PAMI to the programmer, and how these can be used to enable efficient implementations of high level libraries and programming languages like UPC. We evaluate the efficiency of our implementation on a novel P7IHsystem with up to 4096 cores, running micro benchmarks designed to find performance deficiencies in the endpoints implementation of both point-to-point and collective functions.
International Journal of Parallel Programming | 2018
William P. Horn; Manoj Kumar; Joefon Jann; José E. Moreira; Pratap Pattnaik; Mauricio J. Serrano; Gabriel Tanase; Hao Yu
Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph analytics have been proposed in the literature to help users write efficient large scale graph algorithms. In this paper we present an alternative model for implementing graph algorithms using a linear algebra based specification. We first specify a set of linear algebra primitives that allows users to express graph algorithms by composition of linear algebra operations. We then describe a high performance implementation of these primitives using C
ieee high performance extreme computing conference | 2016
Manoj Kumar; Mauricio J. Serrano; José E. Moreira; Pratap Pattnaik; William P. Horn; Joefon Jann; Gabriel Tanase
international parallel and distributed processing symposium | 2017
William P. Horn; Gabriel Tanase; Hao Yu; Pratap Pattnaik
++
arXiv: Databases | 2018
Gabriel Tanase; Toyotaro Suzumura; Jinho Lee; Chun-Fu Chen; Jason Crawford; Hiroki Kanezashi; Song Zhang; Warut D.Vijitbenjaronk
international conference on big data | 2017
Warut D.Vijitbenjaronk; Jinho Lee; Toyotaro Suzumura; Gabriel Tanase
++ and subsequently its integration with the Spark framework to achieve the scalability we need for large systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.