Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vincent Cavé is active.

Publication


Featured researches published by Vincent Cavé.


principles and practice of programming in java | 2011

Habanero-Java: the new adventures of old X10

Vincent Cavé; Jisheng Zhao; Jun Shirako; Vivek Sarkar

In this paper, we present the Habanero-Java (HJ) language developed at Rice University as an extension to the original Java-based definition of the X10 language. HJ includes a powerful set of task-parallel programming constructs that can be added as simple extensions to standard Java programs to take advantage of todays multi-core and heterogeneous architectures. The language puts a particular emphasis on the usability and safety of parallel constructs. For example, no HJ program using async, finish, isolated, and phaser constructs can create a logical deadlock cycle. In addition, the future and data-driven task variants of the async construct facilitate a functional approach to parallel programming. Finally, any HJ program written with async, finish, and phaser constructs that is data-race free is guaranteed to also be deterministic. HJ also features two key enhancements that address well known limitations in the use of Java in scientific computing --- the inclusion of complex numbers as a primitive data type, and the inclusion of array-views that support multidimensional views of one-dimensional arrays. The HJ compiler generates standard Java class-files that can run on any JVM for Java 5 or higher. The HJ runtime is responsible for orchestrating the creation, execution, and termination of HJ tasks, and features both work-sharing and work-stealing schedulers. HJ is used at Rice University as an introductory parallel programming language for second-year undergraduate students. A wide variety of benchmarks have been ported to HJ, including a full application that was originally written in Fortran 90. HJ has a rich development and runtime environment that includes integration with DrJava, the addition of a data race detection tool, and service as a target platform for the Intel Concurrent Collections coordination language


Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism archive | 2010

Concurrent Collections

Zoran Budimlic; Michael G. Burke; Vincent Cavé; Kathleen Knobe; Geoff Lowney; Ryan R. Newton; Jens Palsberg; David M. Peixotto; Vivek Sarkar; Frank Schlimbach; Sagnak Tasirlar

We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models.


acm sigplan symposium on principles and practice of parallel programming | 2010

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Yi Guo; Jisheng Zhao; Vincent Cavé; Vivek Sarkar

This paper introduces SLAW, a Scalable Locality-aware Adaptive Work-stealing scheduler. The SLAW scheduler is designed to address two common limitations in current work-stealing schedulers: use of a fixed task scheduling policy and locality-obliviousness due to randomized stealing. Past work has demonstrated the pros and cons of using fixed scheduling policies, such as work-first and help-first, in different cases without a clear win for one policy over the other. The SLAW scheduler addresses this limitation by supporting both work-first and help-first policies simultaneously. It does so by using an adaptive approach that selects a scheduling policy on a per-task basis at runtime. The SLAW scheduler also establishes bounds on the stack and heap space needed to store tasks. The experimental results for the benchmarks studied in this paper show that SLAWs adaptive scheduler achieves 0.98× to 9.2× speedup over the help-first scheduler and 0.97× to 4.5× speedup over the work-first scheduler for 64-thread executions, thereby establishing the robustness of using an adaptive approach instead of a fixed policy. In contrast, the help-first policy is 9.2× slower than work-first in the worst case for a fixed help-first policy, and the work-first policy is 3.7× slower than help-first in the worst case for a fixed work-first policy. Further, for large irregular recursive parallel computations, the adaptive scheduler runs with bounded stack usage and achieves performance (and supports data sizes) that cannot be delivered by the use of any single fixed policy. It is also known that work-stealing schedulers can be cache-unfriendly for some applications due to randomized stealing. The SLAW scheduler is designed for programming models where locality hints are provided to the runtime by the programmer or compiler, and achieves locality-awareness by grouping workers into places. Locality awareness can lead to improved performance by increasing temporal data reuse within a worker and among workers in the same place. Our experimental results show that locality-aware scheduling can achieve up to 2.6× speedup over locality-oblivious scheduling, for the benchmarks studied in this paper.


international parallel and distributed processing symposium | 2010

SLAW: A scalable locality-aware adaptive work-stealing scheduler

Yi Guo; Jisheng Zhao; Vincent Cavé; Vivek Sarkar

This paper introduces SLAW, a Scalable Locality-aware Adaptive Work-stealing scheduler. The SLAW scheduler is designed to address two common limitations in current work-stealing schedulers: use of a fixed task scheduling policy and locality-obliviousness due to randomized stealing. Past work has demonstrated the pros and cons of using fixed scheduling policies, such as work-first and help-first, in different cases without a clear win for one policy over the other. The SLAW scheduler addresses this limitation by supporting both work-first and help-first policies simultaneously. It does so by using an adaptive approach that selects a scheduling policy on a per-task basis at runtime. The SLAW scheduler also establishes bounds on the stack and heap space needed to store tasks. The experimental results for the benchmarks studied in this paper show that SLAWs adaptive scheduler achieves 0.98× to 9.2× speedup over the help-first scheduler and 0.97× to 4.5× speedup over the work-first scheduler for 64-thread executions, thereby establishing the robustness of using an adaptive approach instead of a fixed policy. In contrast, the help-first policy is 9.2× slower than work-first in the worst case for a fixed help-first policy, and the work-first policy is 3.7× slower than help-first in the worst case for a fixed work-first policy. Further, for large irregular recursive parallel computations, the adaptive scheduler runs with bounded stack usage and achieves performance (and supports data sizes) that cannot be delivered by the use of any single fixed policy. It is also known that work-stealing schedulers can be cache-unfriendly for some applications due to randomized stealing. The SLAW scheduler is designed for programming models where locality hints are provided to the runtime by the programmer or compiler, and achieves locality-awareness by grouping workers into places. Locality awareness can lead to improved performance by increasing temporal data reuse within a worker and among workers in the same place. Our experimental results show that locality-aware scheduling can achieve up to 2.6× speedup over locality-oblivious scheduling, for the benchmarks studied in this paper.


international parallel and distributed processing symposium | 2013

Integrating Asynchronous Task Parallelism with MPI

Sanjay Chatterjee; Sagnak Tasirlar; Zoran Budimlic; Vincent Cavé; Milind Chabbi; Max Grossman; Vivek Sarkar; Yonghong Yan

Effective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased communication-computation overlap, improved memory utilization, and effective use of accelerators. However, current “hybrid programming” approaches often require significant rewrites of application code and assume a high level of programmer expertise. Dynamic task parallelism has been widely regarded as a programming model that combines the best of performance and programmability for shared-memory programs. For distributed-memory programs, most users rely on efficient implementations of MPI. In this paper, we propose HCMPI (Habanero-C MPI), an integration of the Habanero-C dynamic task-parallel programming model with the widely used MPI message-passing interface. All MPI calls are treated as asynchronous tasks in this model, thereby enabling unified handling of messages and tasking constructs. For programmers unfamiliar with MPI, we introduce distributed data-driven futures (DDDFs), a new data-flow programming model that seamlessly integrates intra-node and inter-node data-flow parallelism without requiring any knowledge of MPI. Our novel runtime design for HCMPI and DDDFs uses a combination of dedicated communication and computation specific worker threads. We evaluate our approach on a set of micro-benchmarks as well as larger applications and demonstrate better scalability compared to the most efficient MPI implementations, while offering a unified programming model to integrate asynchronous task parallelism with distributed-memory parallelism.


conference on object-oriented programming systems, languages, and applications | 2009

The habanero multicore software research project

Rajkishore Barik; Zoran Budimlic; Vincent Cavé; Sanjay Chatterjee; Yi Guo; David M. Peixotto; Raghavan Raman; Jun Shirako; Sagnak Tasirlar; Yonghong Yan; Yisheng Zhao; Vivek Sarkar

Multiple programming models are emerging to address an increased need for dynamic task parallelism in multicore shared-memory multiprocessors. This poster describes the main components of Rice Universitys Habanero Multicore Software Research Project, which proposes a new approach to multicore software enablement based on a two-level programming model consisting of a higher-level coordination language for domain experts and a lower-level parallel language for programming experts.


ieee high performance extreme computing conference | 2016

The Open Community Runtime: A runtime system for extreme scale computing

Timothy G. Mattson; Romain Cledat; Vincent Cavé; Vivek Sarkar; Zoran Budimlic; Sanjay Chatterjee; Joshua B. Fryman; Ivan Ganev; Robin Knauerhase; Min Lee; Benoît Meister; Brian R. Nickerson; Nick Pepperling; Bala Seshasayee; Sagnak Tasirlar; Justin Teller; Nick Vrvilo

The Open Community Runtime (OCR) is a new runtime system designed to meet the needs of extreme-scale computing. While there is growing support for the idea that future execution models will be based on dynamic tasks, there is little agreement on what else should be included. OCR minimally adds events for synchronization and relocatable data-blocks for data management to form a complete system that supports a wide range of higher-level programming models. This paper lays out the fundamental concepts behind OCR and compares OCR performance to that from MPI for two simple benchmarks. OCR has been developed within an open community model with features supporting flexible algorithm expression weighed against the expected realities of extreme-scale computing: power-constrained execution, aggressive growth in the number of compute resources, deepening memory hierarchies and a low mean-time between failures.


Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

HabaneroUPC++: a Compiler-free PGAS Library

Vivek A. Kumar; Yili Zheng; Vincent Cavé; Zoran Budimlic; Vivek Sarkar

The Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, providing the basis for high performance and high productivity parallel programming environments. UPC++ [39] is a very recent PGAS implementation that takes a library-based approach and avoids the complexities associated with compiler transformations. However, this implementation does not support dynamic task parallelism and only relies on other threading models (e.g., OpenMP or pthreads) for exploiting parallelism within a PGAS place. In this paper, we introduce a compiler-free PGAS library called HabaneroUPC++, which supports a tighter integration of intra-place and inter-place parallelism than standard hybrid programming approaches. The library makes heavy use of C++11 lambda functions in its APIs. C++11 lambdas avoid the need for compiler support while still retaining the syntactic convenience of language-based approaches. The HabaneroUPC++ library implementation is based on a tight integration of the UPC++ library and the Habanero-C++ library, with new extensions to support the integration. The UPC++ library is used to provide PGAS communication and function shipping support using GASNet, and the Habanero-C++ library is used to provide support for intra-place work-stealing integrated with function shipping. We demonstrate the programmability and performance of our implementation using two benchmarks, scaled up to 6K cores. The insights developed in this paper promise to further enhance the usability and popularity of PGAS programming models.


evaluation and usability of programming languages and tools | 2010

Comparing the usability of library vs. language approaches to task parallelism

Vincent Cavé; Zoran Budimlic; Vivek Sarkar

In this paper, we compare the usability of a library approach with a language approach to task parallelism. There are many practical advantages and disadvantages to both approaches. A key advantage of a library-based approach is that it can be deployed without requiring any change in the tool chain, including compilers and IDEs. However, the use of library APIs to express all aspects of task parallelism can lead to code that is hard to understand and modify. A key advantage of a language-based approach is that the intent of the programmer is easier to express and understand, both by other programmers and by program analysis tools. However, a language-based approach usually requires the standardization of new constructs and (possibly) of new keywords. In this paper, we compare the java.util.concurrent (j.u.c) library [14] from Java 7 and the Habanero-Java (HJ) [16] language, supported by our experiences in teaching both models at Rice University.


conference on object-oriented programming systems, languages, and applications | 2011

The design and implementation of the habanero-java parallel programming language

Zoran Budimlic; Vincent Cavé; Raghavan Raman; Jun Shirako; Sagnak Tasirlar; Jisheng Zhao; Vivek Sarkar

The Habanero-Java language extends sequential Java with a simple but powerful set of constructs for multicore parallelism. Its implementation includes a compiler that generates standard Java classfiles, a runtime system that builds on the java.util.concurrent library, an IDE (DrHJ) that extends DrJava, and a new data-race detection tool.

Collaboration


Dive into the Vincent Cavé's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge