Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuri Dotsenko is active.

Publication


Featured researches published by Yuri Dotsenko.


acm sigplan symposium on principles and practice of parallel programming | 2005

An evaluation of global address space languages: co-array fortran and unified parallel C

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; François Cantonnet; Tarek A. El-Ghazawi; Ashrujit Mohanti; Yiyi Yao; Daniel G. Chavarría-Miranda

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.


international conference on parallel architectures and compilation techniques | 2004

A Multi-Platform Co-Array Fortran Compiler

Yuri Dotsenko; Cristian Coarfa; John M. Mellor-Crummey

Co-array Fortran (CAF) - a small set of extensions to Fortran 90 - is an emerging model for scalable, global address space parallel programming. CAFs global address space programming model simplifies the development of single-program-multiple-data parallel programs by shifting the burden for managing the details of communication from developers to compilers. This paper describes CAFC - a prototype implementation of an open-source, multiplatform CAF compiler that generates code well-suited for todays commodity clusters. The CAFC compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. The paper describes key details of CAFCs approach to generating efficient code for multiple platforms. Experiments compare the performance of CAF and MPI versions of several NAS parallel benchmarks on an Alpha cluster with a Quadrics interconnect, an Itanium 2 cluster with a Myrinet 2000 interconnect and an Itanium 2 cluster with a Quadrics interconnect. These experiments show that CAFC compiles CAF programs into code that delivers performance roughly equal to that of hand-optimized MPI programs.


languages and compilers for parallel computing | 2003

Co-array Fortran Performance and Potential: An NPB Experimental Study

Cristian Coarfa; Yuri Dotsenko; Jason Eckhardt; John M. Mellor-Crummey

Co-array Fortran (CAF) is an emerging model for scalable, global address space parallel programming that consists of a small set of extensions to the Fortran 90 programming language. Compared to MPI, the widely-used message-passing programming model, CAF’s global address space programming model simplifies the development of single-program-multiple-data parallel programs by shifting the burden for choreographing and optimizing communication from developers to compilers. This paper describes an open-source, portable, and retargetable CAF compiler under development at Rice University that is well-suited for today’s high-performance clusters. Our compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. Preliminary experiments comparing CAF and MPI versions of several of the NAS parallel benchmarks on an Itanium 2 cluster with a Myrinet 2000 interconnect show that our CAF compiler delivers performance that is roughly equal to or, in many cases, better than that of programs parallelized using MPI, even though support for global optimization of communication has not yet been implemented in our compiler.


international conference on supercomputing | 2007

Scalability analysis of SPMD codes using expectations

Cristian Coarfa; John M. Mellor-Crummey; Nathan Froyd; Yuri Dotsenko

We present a new technique for identifying scalability bottlenecks in executions of single-program, multiple-data (SPMD) parallel programs, quantifying their impact on performance, and associating this information with the program source code. Our performance analysis strategy involves three steps. First, we collect call path profiles for two or more executions on different numbers of processors. Second, we use our expectations about how the performance of executions should differ, e.g., linear speedup for strong scaling or constant execution time for weak scaling, to automatically compute the scalability of costs incurred at each point in a programs execution. Third, with the aid of an interactive browser, an application developer can explore a programs performance in a top-down fashion, see the contexts in which poor scaling behavior arises, and understand exactly how much each scalability bottleneck dilates execution time. Our analysis technique is independent of the parallel programming model. We describe our experiences applying our technique to analyze parallel programs written in Co-array Fortran and Unified Parallel C, as well as message-passing programs based on MPI.


The Journal of Supercomputing | 2006

Experiences with Sweep3D implementations in Co-array Fortran

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey

As part of the recent focus on increasing the productivity of parallel application developers, Co-array Fortran (CAF) has emerged as an appealing alternative to the Message Passing Interface (MPI). CAF belongs to the family of global address space parallel programming languages; such languages provide the abstraction of globally addressable memory accessed using one-sided communication. At Rice University we are developing caf c, an open source, multiplatform CAF compiler. Our earlier studies show that caf c-compiled CAF programs achieve similar performance to that of corresponding MPI codes for the NAS Parallel Benchmarks. In this paper, we present a study of several CAF implementations of Sweep3D on four modern architectures. We analyze the impact of using one-sided communication in Sweep3D, identify potential sources of inefficiencies and suggest ways to address them. Our results show that we achieve comparable performance to that of the MPI version on three cluster-based architectures and outperform it by up to 10 % on the SGI Altix 3000.


ieee international conference on high performance computing data and analytics | 2004

Experiences with co-array fortran on hardware shared memory platforms

Yuri Dotsenko; Cristian Coarfa; John M. Mellor-Crummey; Daniel G. Chavarría-Miranda

When performing source-to-source compilation of Co-array Fortran (CAF) programs into SPMD Fortran 90 codes for shared-memory multiprocessors, there are several ways of representing and manipulating data at the Fortran 90 language level. We describe a set of implementation alternatives and evaluate their performance implications for CAF variants of the STREAM, Random Access, Spark98 and NAS MG & SP benchmarks. We compare the performance of library-based implementations of one-sided communication with fine-grain communication that accesses remote data using load and store operations. Our experiments show that using application-level loads and stores for fine-grain communication can improve performance by as much as a factor of 24; however, codes requiring only coarse-grain communication can achieve better performance by using an architectures tuned memcpy for bulk data movement.


international conference on parallel and distributed systems | 2005

PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; Luay Nakhleh; Usman Roshan

Accurate reconstruction of phylogenetic trees very often involves solving hard optimization problems, particularly the maximum parsimony (MP) and maximum likelihood (ML) problems. Various heuristics have been devised for solving these two problems; however, they obtain good results within reasonable time only on small datasets. This has been a major impediment for large-scale phylogeny reconstruction, particularly for the effort to assemble the Tree of Life - the evolutionary relationship of all organisms on earth. Roshan et al. recently introduced Rec-I-DCM3, an efficient and accurate meta-method for solving the MP problem on large datasets of up to 14,000 taxa. Nonetheless, a drastic improvement in Rec-I-DCM3s performance is still needed in order to achieve similar (or better) accuracy on datasets at the scale of the Tree of Life. In this paper, we improve the performance of Rec-I-DCM3 via parallelization. Experimental results demonstrate that our parallel method, PRec-I-DCM3, achieves significant improvements, both in speed and accuracy, over its sequential counterpart


International Journal of Bioinformatics Research and Applications | 2006

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction

Yuri Dotsenko; Cristian Coarfa; Luay Nakhleh; John M. Mellor-Crummey; Usman Roshan

Accurate reconstruction of phylogenetic trees very often involves solving hard optimization problems, particularly the maximum parsimony (MP) and maximum likelihood (ML) problems. Various heuristics have been devised for solving these two problems; however, they obtain good results within reasonable time only on small datasets. This has been a major impediment for large-scale phylogeny reconstruction, particularly for the effort to assemble the Tree of Life - the evolutionary relationship of all organisms on earth. Roshan et al. recently introduced Rec-I-DCM3, an efficient and accurate meta-method for solving the MP problem on large datasets of up to 14,000 taxa. Nonetheless, a drastic improvement in Rec-I-DCM3s performance is still needed in order to achieve similar (or better) accuracy on datasets at the scale of the Tree of Life. In this paper, we improve the performance of Rec-I-DCM3 via parallelization. Experimental results demonstrate that our parallel method, PRec-I-DCM3, achieves significant improvements, both in speed and accuracy, over its sequential counterpart


Archive | 2007

Expressiveness, programmability and portable high performance of global address space languages

John M. Mellor-Crummey; Yuri Dotsenko


Lecture Notes in Computer Science | 2004

Co-array Fortran performance and potential: An NPB experimental study

Cristian Coarfa; Yuri Dotsenko; Jason Eckhardt; John M. Mellor-Crummey

Collaboration


Dive into the Yuri Dotsenko's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel G. Chavarría-Miranda

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Usman Roshan

New Jersey Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ashrujit Mohanti

George Washington University

View shared research outputs
Top Co-Authors

Avatar

François Cantonnet

George Washington University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tarek A. El-Ghazawi

George Washington University

View shared research outputs
Researchain Logo
Decentralizing Knowledge