Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jaydeep Marathe is active.

Publication


Featured researches published by Jaydeep Marathe.


symposium on code generation and optimization | 2010

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

John A. Stratton; Vinod Grover; Jaydeep Marathe; Bastiaan Aarts; Michael Murphy; Ziang Hu; Wen-mei W. Hwu

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs. Applications tested with our tool often showed performance parity with the compiled C version of the application for single-thread performance. With modest coarse-grained multithreading typical of todays CPU architectures, an average of 3.4x speedup on 4 processors was observed across the test applications.


international conference on conceptual structures | 2012

CUDA: Compiling and optimizing for a GPU platform

Gautam Chakrabarti; Vinod Grover; Bastiaan Aarts; Xiangyun Kong; Manjunath Kudlur; Yuan Lin; Jaydeep Marathe; Michael Murphy; Jian-Zhong Wang

Abstract Graphics processor units (GPUs) have evolved to handle throughput oriented workloads where a large number of parallel threads must make progress. Such threads are organized around shared memory making it possible to synchronize and cooperate on shared data. Current GPUs can run tens of thousands of hardware threads and have been optimized for graphics workloads. Several high level languages have been developed to easily program the GPUs for general purpose computing problems. The use of high-level languages introduces the need for highly optimizing compilers that target the parallel GPU device. In this paper, we present our experiences in developing compilation techniques for a high level language called CUDA C. We explain the CUDA architecture and programming model and provide insights into why certain optimizations are important for achieving high performance on a GPU. In addition to classical optimizations, we present optimizations developed specifically for the CUDA architecture. We evaluate these techniques, and present performance results that show significant improvements on hundreds of kernels as well as applications.


languages and compilers for parallel computing | 2013

Separate Compilation in a Language-Integrated Heterogeneous Environment

Michael Murphy; Jaydeep Marathe; Girish Bharambe; Sean Lee; Vinod Grover

Heterogeneous computing platforms are becoming more common in recent years. Effective programming languages and tools will play a key role in unlocking the performance potential of these systems. In this paper, we present the design and implementation of separate compilation and linking support for the CUDA programming platform. CUDA provides a language-integrated environment for writing parallel programs targeting hybrid systems with CPUs and GPUs (Graphics Processing Unit). We present a novel linker that allows linking of multiple subsets of GPU executable code. We also describe a link time optimization of GPU shared memory layout. Finally, we measure the impact of separate compilation with real world benchmarks and present our conclusions.


Archive | 2013

SOFTWARE DEVELOPMENT ENVIRONMENT AND METHOD OF COMPILING INTEGRATED SOURCE CODE

Stephen Jones; Jaydeep Marathe; Vivek Kini; Bastiaan Aarts


Archive | 2012

System and method for translating program functions for correct handling of local-scope variables and computing system incorporating the same

Yuan Lin; Gautam Chakrabarti; Jaydeep Marathe; Okwan Kwon; Amit Sabne


Archive | 2012

System and method for executing sequential code using a group of threads and single-instruction, multiple-thread processor incorporating the same

Gautam Chakrabarti; Yuan Lin; Jaydeep Marathe; Okwan Kwon; Amit Sabne


Archive | 2012

SYSTEM AND METHOD FOR COMPILING OR RUNTIME EXECUTING A FORK-JOIN DATA PARALLEL PROGRAM WITH FUNCTION CALLS ON A SINGLE-INSTRUCTION-MULTIPLE-THREAD PROCESSOR

Yuan Lin; Gautam Chakrabarti; Jaydeep Marathe; Okwan Kwon; Amit Sabne


Archive | 2012

System and method for allocating memory of differing properties to shared data objects

Jaydeep Marathe; Gautam Chakrabarti; Yuan Lin; Okwan Kwon; Amit Sabne


Archive | 2011

Method for Transforming a Multithreaded Program for General Execution

Jaydeep Marathe; Vinod Grover


Archive | 2012

METHOD AND SYSTEM FOR RUN TIME DETECTION OF SHARED MEMORY DATA ACCESS HAZARDS

Vyas Venkataraman; Jaydeep Marathe; Manjunath Kudlur; Vinod Grover; Geoffrey Gerfin; Alban Douillet; Mayank Kaushik

Collaboration


Dive into the Jaydeep Marathe's collaboration.

Researchain Logo
Decentralizing Knowledge