Priya Unnikrishnan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Priya Unnikrishnan is active.

Explore More

Publication

Featured researches published by Priya Unnikrishnan.

IEEE Transactions on Parallel and Distributed Systems | 2009

The Design of OpenMP Tasks

Eduard Ayguadé; Nawal Copty; Alejandro Duran; Jay Hoeflinger; Yuan Lin; Federico Massaioli; Xavier Teruel; Priya Unnikrishnan; Guansong Zhang

OpenMP has been very successful in exploiting structured parallelism in applications. With increasing application complexity, there is a growing need for addressing irregular parallelism in the presence of complicated control structures. This is evident in various efforts by the industry and research communities to provide a solution to this challenging problem. One of the primary goals of OpenMP 3.0 is to define a standard dialect to express and efficiently exploit unstructured parallelism. This paper presents the design of the OpenMP tasking model by members of the OpenMP 3.0 tasking sub-committee which was formed for this purpose. The paper summarizes the efforts of the sub-committee (spanning over two years) in designing, evaluating and seamlessly integrating the tasking model into the OpenMP specification. In this paper, we present the design goals and key features of the tasking model, including a rich set of examples and an in-depth discussion of the rationale behind various design choices. We compare a prototype implementation of the tasking model with existing models, and evaluate it on a wide range of applications. The comparison shows that the OpenMP tasking model provides expressiveness, flexibility, and huge potential for performance and scalability.

international workshop on openmp | 2007

A Proposal for Task Parallelism in OpenMP

Eduard Ayguadé; Nawal Copty; Alejandro Duran; Jay Hoeflinger; Yuan Lin; Federico Massaioli; Ernesto Su; Priya Unnikrishnan; Guansong Zhang

This paper presents a novel proposal to define task parallelism in OpenMP. Task parallelism has been lacking in the OpenMP language for a number of years already. As we show, this makes certain kinds of applications difficult to parallelize, inefficient or both. A subcommittee of the OpenMP language committee, with representatives from a number of organizations, prepared this proposal to give OpenMP a way to handle unstructured parallelism. While defining the proposal we had three design goals: simplicity of use, simplicity of specificationand consistency with the rest of OpenMP. Unfortunately, these goals were in conflict many times during our discussions. The paper describes the proposal, some of the problems we faced, the different alternatives, and the rationale for our choices. We show how to use the proposal to parallelize some of the classical examples of task parallelism, like pointer chasing and recursive functions.

conference of the centre for advanced studies on collaborative research | 2008

OpenMP tasks in IBM XL compilers

Xavier Teruel; Priya Unnikrishnan; Xavier Martorell; Eduard Ayguadé; Raul Esteban Silvera; Guansong Zhang; Ettore Tiotto

Tasking is the most significant feature included in the new OpenMP 3.0 standard. It was introduced to handle unstructured parallelism and broaden the range of applications that can be parallelized by OpenMP. This paper presents the design and implementation of the task model in the IBM XL parallelizing compilers. The task construct is significantly different from other OpenMP constructs. This paper discusses some of the unique challenges in implementing the task construct and its associated synchronization constructs in the compiler. We also present a performance evaluation of our implementation on a set of benchmarks and applications. We identify limitations in the current implentation and propose solutions for further improvement.

international conference on parallel processing | 2012

A practical approach to DOACROSS parallelization

Priya Unnikrishnan; Jun Shirako; Kit Barton; Sanjay Chatterjee; Raul Esteban Silvera; Vivek Sarkar

Loops with cross-iteration dependences (doacross loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on doall loops, and consider doacross parallelism to be impractical due to the space inefficiencies and the synchronization overheads of past approaches. This paper presents a novel and practical approach to automatically parallelizing doacross loops for execution on manycore-SMP systems. We introduce a compiler-and-runtime optimization called dependence folding that bounds the number of synchronization variables allocated per worker thread (processor core) to be at most the maximum depth of a loop nest being considered for automatic parallelization. Our approach has been implemented in a development version of the IBM XL Fortran V13.1 commercial parallelizing compiler and runtime system. For four benchmarks where automatic doall parallelization was largely ineffective (speedups of under 2×), our implementation delivered speedups of 6.5×, 9.0×, 17.3×, and 17.5× on a 32-core IBM Power7 SMP system, thereby showing that doacross parallelization can be a valuable technique to complement doall parallelization.

ieee international conference on high performance computing data and analytics | 2004

Experiments with auto-parallelizing SPEC2000FP benchmarks

Guansong Zhang; Priya Unnikrishnan; James Ren

In this paper, we document the experimental work in our attempts to automatically parallelize SPEC2000FP benchmarks for SMP machines. This is not purely a research project. It was implemented within IBMs software laboratory in a commercial compiler infrastructure that implements OpenMP 2.0 specifications in both Fortran and C/C++. From the beginning, our emphasis is on using simple parallelization techniques. We aim to maintain a good trade-off between performance, especially scalability of an application program and its compilation time. Although the parallelization results show relatively low speed up, it is still promising considering the problems associated with explicit parallel programming and the fact that more and more multi-thread and multi-core chips will soon be available even for home computing.

conference of the centre for advanced studies on collaborative research | 2009

OpenMP tasking analysis for programmers

Xavier Teruel; Christopher Barton; Alejandro Duran; Xavier Martorell; Eduard Ayguadé; Priya Unnikrishnan; Guansong Zhang; Raul Esteban Silvera

As of 2008, the OpenMP 3.0 standard includes task support allowing programmers to exploit irregular parallelism. Although several compilers are providing support for this new feature there has not been extensive investigation into the real possibilities of this extension. Several papers have discussed the programming model itself while other papers have discussed design and implementation on different platforms. There are also papers demonstrating performance results using well known kernel applications. This paper presents an analysis of the OpenMP tasking model possibilities, using the IBM XL compiler implementation. Using different parameters such as the number of tasks, task granularity and parallelism pattern, this paper explores how such parameters can affect the average performance and identifies the limits of the OpenMP tasking model.

international workshop on openmp | 2014

MetaFork : A Framework for Concurrency Platforms Targeting Multicores

Xiaohui Chen; Marc Moreno Maza; Sushek Shekar; Priya Unnikrishnan

We present MetaFork, a metalanguage for multithreaded algorithms based on the fork-join concurrency model and targeting multicore architectures. MetaFork is implemented as a source-to-source compilation framework allowing automatic translation of programs from one concurrency platform to another. The current version of this framework supports CilkPlus and OpenMP. We evaluate the benefits of the MetaFork framework through a series of experiments, such as narrowing performance bottlenecks in multithreaded programs. Our experiments show also that, if a native program, written either in CilkPlus or OpenMP, has little parallelism overhead, then the same property holds for its OpenMP or CilkPlus counterpart translated by MetaFork.

international workshop on openmp | 2013

Expressing DOACROSS Loop Dependences in OpenMP

Jun Shirako; Priya Unnikrishnan; Sanjay Chatterjee; Kelvin Li; Vivek Sarkar

OpenMP is a widely used programming standard for a broad range of parallel systems. In the OpenMP programming model, synchronization points are specified by implicit or explicit barrier operations within a parallel region. However, certain classes of computations, such as stencil algorithms, can be supported with better synchronization efficiency and data locality when using doacross parallelism with point-to-point synchronization than wavefront parallelism with barrier synchronization. In this paper, we propose new synchronization constructs to enable doacross parallelism in the context of the OpenMP programming model. Experimental results on a 32-core IBM Power7 system using four benchmark programs show performance improvements of the proposed doacross approach over OpenMP barriers by factors of 1.4× to 5.2× when using all 32 cores.

ACM Transactions in Embedded Computing Systems | 2009

Reducing memory requirements of resource-constrained applications

Priya Unnikrishnan; Guangyu Chen; Mahmut T. Kandemir; Mustafa Karaköy; Ibrahim Kolcu

Embedded computing platforms are often resource constrained, requiring great design and implementation attention to memory-power-, and heat-related parameters. An important task for a compiler in such platforms is to simplify the process of developing applications for limited memory devices and resource-constrained clients. Focusing on array-intensive embedded applications to be executed on single CPU-based architectures, this work explores how loop-based compiler optimizations can be used for increasing memory location reuse. Our goal is to transform a given application in such a way that the resulting code has fewer cases (as compared to the original code), where the lifetimes of array elements overlap. The reduction in lifetimes of array elements can then be exploited by reusing memory locations as much as possible. Our experimental results indicate that the proposed strategy reduces data space requirements of 15 resource constrained applications by more than 40%, on average. We also demonstrate how this strategy can be combined with data locality (cache behavior)--enhancing techniques so that a compiler can take advantage of both, that is, reduce data memory requirements and improve data locality at the same time.

conference of the centre for advanced studies on collaborative research | 2009

Challenges for parallel computing

Kit Barton; Guansong Zhang; Priya Unnikrishnan

There has been an overwhelming trend in recent years to move towards parallel computing. Hardware manufacturers are increasing the amount of parallelism on a single chip in several ways, including adding more processing cores and accelerators to execute the same instructions on many data items simultaneously. At the other end of the spectrum, as commodity hardware prices fall, it is becoming increasingly affordable to build large-scale, multi-node distributed machines. Similarly, as processor speeds begin to stagnate, software developers will be forced to exploit the parallelism in their applications in order to continue to improve the performance.

Explore More