Shams Imam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shams Imam is active.

Explore More

Publication

Featured researches published by Shams Imam.

conference on object-oriented programming systems, languages, and applications | 2012

Integrating task parallelism with actors

Shams Imam; Vivek Sarkar

This paper introduces a unified concurrent programming model combining the previously developed Actor Model (AM) and the task-parallel Async-Finish Model (AFM). With the advent of multi-core computers, there is a renewed interest in programming models that can support a wide range of parallel programming patterns. The proposed unified model shows how the divide-and-conquer approach of the AFM and the no-shared mutable state and event-driven philosophy of the AM can be combined to solve certain classes of problems more efficiently and productively than either of the aforementioned models individually. The unified model adds actor creation and coordination to the AFM, while also enabling parallelization within actors. This paper describes two implementations of the unified model as extensions of Habanero-Java and Habanero-Scala. The unified model adds to the foundations of parallel programs, and to the tools available for the programmer to aid in productivity and performance while developing parallel software.

programming based on actors, agents, and decentralized control | 2014

Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries

Shams Imam; Vivek Sarkar

This paper introduces the Savina benchmark suite for actor-oriented programs. Our goal is to provide a standard benchmark suite that enables researchers and application developers to compare different actor implementations and identify those that deliver the best performance for a given use-case. The benchmarks in Savina are diverse, realistic, and represent compute (rather than I/O) intensive applications. They range from popular micro-benchmarks to classical concurrency problems to applications that demonstrate various styles of parallelism. Implementations of the benchmarks on various actor libraries are made publicly available through an open source release. This will allow other developers and researchers to compare the performance of their actor libraries on these common set of benchmarks.

european conference on object oriented programming | 2014

Cooperative Scheduling of Parallel Tasks with General Synchronization Patterns

Shams Imam; Vivek Sarkar

In this paper, we address the problem of scheduling parallel tasks with general synchronization patterns using a cooperative runtime. Current implementations for task-parallel programming models provide efficient support for fork-join parallelism, but are unable to efficiently support more general synchronization patterns such as locks, futures, barriers and phasers. We propose a novel approach to addressing this challenge based on cooperative scheduling with one-shot delimited continuations OSDeConts and event-driven controls EDCs. The use of OSDeConts enables the runtime to suspend a task at any point thereby enabling the tasks worker to switch to another task whereas other runtimes may have forced the tasks worker to be blocked. The use of EDCs ensures that identification of suspended tasks that are ready to be resumed can be performed efficiently. Furthermore, our approach is more efficient than schedulers that spawn additional worker threads to compensate for blocked worker threads. We have implemented our cooperative runtime in Habanero-Java HJ, an explicitly parallel language with a large variety of synchronization patterns. The OSDeConts and EDC primitives are used to implement a wide range of synchronization constructs, including those where a task may trigger the enablement of multiple suspended tasks as in futures, barriers and phasers. In contrast, current task-parallel runtimes and schedulers for the fork-join model including schedulers for the Cilk language focus on the case where only one continuation is enabled by an event typically, the termination of the last child/descendant task in a join scope. Our experimental results show that the HJ cooperative runtime delivers significant improvements in performance and memory utilization on various benchmarks using future and phaser constructs, relative to a thread-blocking runtime system while using the same underlying work-stealing task scheduler.

programming based on actors, agents, and decentralized control | 2014

Selectors: Actors with Multiple Guarded Mailboxes

Shams Imam; Vivek Sarkar

The actor programming model is based on asynchronous message passing and offers a promising approach for developing reliable concurrent systems. However, lack of guarantees to control the order in which messages are processed next by an actor makes implementing synchronization and coordination patterns difficult. In this work, we address this issue by introducing our extension to the actor model called selectors. Selectors have multiple mailboxes and each mailbox is guarded i.e. it can be enabled or disabled to affect the order in which messages are processed. The view of having guarded mailboxes is inspired by condition variables where a thread checks whether a condition is true before continuing its execution. Selectors allow us to simplify writing of synchronization and coordination patterns using actors such as a) synchronous request-reply, b) join patterns in streaming applications, c) supporting priorities in message processing, d) variants of reader-writer concurrency, and e) producer-consumer with bounded buffer. We present solutions to each of these patterns using selectors. Selectors can also be implemented efficiently -- we evaluate the performance of our library implementation of selectors on benchmarks that exhibit such patterns and we compare our implementation against actor-based solutions using Scala, Akka, Jetlang, Scalaz, Functional-Java and Habanero actor libraries. Our experimental results for the benchmarks show that using selector-based solutions simplify programmability and deliver significant performance improvements compared to other actor-based solutions.

Journal of Parallel and Distributed Computing | 2017

Pedagogy and tools for teaching parallel computing at the sophomore undergraduate level

Max Grossman; Maha Aziz; Heng Chi; Anant Tibrewal; Shams Imam; Vivek Sarkar

As the need for multicore-aware programmers rises in both science and industry, Computer Science departments in universities around the USA are having to rethink their parallel computing curriculum. At Rice University, this rethinking took the shape of COMP 322, an introductory parallel programming course that is required for all Bachelors students. COMP 322 teaches students to reason about the behavior of parallel programs, educating them in both the high level abstractions of task-parallel programming as well as the nitty gritty details of working with threads in Java.In this paper, we detail the structure, principles, and experiences of COMP 322, gained from 6 years of teaching parallel programming to second-year undergraduates. We describe in detail two particularly useful tools that have been integrated into the curriculum: the HJlibparallel programming library and the Habanero Autograder for parallel programs. We present this work with the hope that it will help augment improvements to parallel computing education at other universities. An overview of parallel computing pedagogy at Rice University, including a unique approach to incrementally teaching parallel programming: from abstract parallel concepts to hands-on experience with industry-standard frameworks.A description of the HJlib parallel programming library and its applicability to parallel programming education.A description of the motivation, design, and implementation of the Habanero Autograder, a tool for providing automated and immediate feedback to students on programming assignments.A discussion of unexpected benefits from using the Habanero Autograder as part of Rice Universitys core parallel computing curriculum.

principles and practice of programming in java | 2016

A Distributed Selectors Runtime System for Java Applications

Arghya Chatterjee; Branko Gvoka; Bing Xue; Zoran Budimlic; Shams Imam; Vivek Sarkar

The demand for portable mainstream programming models supporting scalable, reactive and versatile distributed computing is growing dramatically with the proliferation of manycore/heterogeneous processors on portable devices and cloud computing clusters that can be elastically and dynamically allocated. With such changes, distributed software systems and applications are shifting towards service oriented architectures (SOA) that consist of largely decoupled, dynamically replaceable components and connected via loosely coupled, interactive networks that may exhibit more complex coordination and synchronization patterns. In this paper, we propose the Distributed Selector (DS) model, to address the aforementioned requirements via a simple easy-to-use API. Our implementation of this model runs on distributed JVMs, and features automated bootstrap and global termination. We focus on the Selector Model (a generalization of the actor model) as a foundation for creating distributed programs and introduce a unified runtime system that supports both shared memory and distributed multi-node execution of such programs. The multiple guarded mailboxes, a unique and novel property of Selectors, enable the programmer to easily specify coordination patterns that are strictly more general than those supported by the Actor model. We evaluate the performance of our selector-based distributed implementation using benchmarks from the Savina benchmark suite [13]. Our results show promising scalability performance for various message exchange patterns. We also demonstrate high programming productivity arising from high-level abstraction and location transparency in the HJ Distributed Selector Runtime library (as evidenced by minimal differences between single-node and multi-node implementations of a selector-based application), as well as the contribution of automated system bootstrap and global termination capabilities.

principles and practice of programming in java | 2015

HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators

Max Grossman; Shams Imam; Vivek Sarkar

Recently there has been increasing interest in supporting execution of Java Virtual Machine (JVM) applications on accelerator architectures, such as GPUs. Unfortunately, there is a large gap between the features of the JVM and those commonly supported by accelerators. Examples of important JVM features include exceptions, dynamic memory allocation, use of arbitrary composite objects, file I/O, and more. Recent work from our research group tackled the first feature in that list, JVM exception semantics[14]. This paper continues along that path by enabling the acceleration of JVM parallel regions that include object references and dynamic memory allocation. The contributions of this work include 1) serialization and deserialization of JVM objects using a format that is compatible with OpenCL accelerators, 2) advanced code generation techniques for converting JVM bytecode to OpenCL kernels when object references and dynamic memory allocation are used, 3) runtime techniques for supporting dynamic memory allocation on OpenCL accelerators, and 4) a novel redundant data movement elimination technique based on inter-parallel-region dataflow analysis using runtime bytecode inspection. Experimental results presented in this paper show performance improvements of up to 18.33× relative to parallel Java Streams for GPU-accelerated parallel regions, even when those regions include object references and dynamic memory allocation. In our evaluation, we fully characterize where accelerators or the JVM see performance wins and point out opportunities for future work.

european conference on parallel processing | 2015

A Composable Deadlock-Free Approach to Object-Based Isolation

Shams Imam; Jisheng Zhao; Vivek Sarkar

A widely used principle in the design of concurrent programs is isolation – the property that a task can operate on shared data without interference from other tasks. In this paper, we introduce a new approach to object-based isolation that is guaranteed to be deadlock-free, while still retaining the rollback benefits of transactions. Further, our approach differentiates between read and write accesses in its concurrency control mechanisms. Finally, since the generality of our approach precludes the use of static ordering for deadlock avoidance, our runtime ensures deadlock-freedom by detecting and resolving deadlocks at runtime automatically, without involving the programmer.

european conference on parallel processing | 2015

Load Balancing Prioritized Tasks via Work-Stealing

Shams Imam; Vivek Sarkar

Work-stealing schedulers focus on minimizing overhead in task scheduling. Consequently, they avoid features, such as task priorities, which can add overhead to the implementation. Thus in such schedulers, low priority tasks may be scheduled earlier, delaying the execution of higher priority tasks and possibly increasing overall execution time.

ieee international conference on high performance computing data and analytics | 2011

Poster: connecting PGAS and traditional HPC languages

Adrian Prantl; Thomas Epperly; Shams Imam

Chapel is a high-level parallel programming language that implements a partitioned global address space model (PGAS). Programs written in this programming model have traditionally been self-contained entities written entirely in one language. On our poster, we present BRAID, which enables Chapel programs to call functions and instantiate objects written in C, C++, Fortran 77-2008, Java and Python. Our tool creates language bindings that are binary-compatible with those generated by the Babel language interoperability tool. The scientific community maintains a large amount of code written in traditional languages. With the help of our tool, users will gain access to their existing codebase with minimal effort and through a well-defined interface. The language bindings are designed to provide a good combination of performance and flexibility (including transparent access to distributed arrays). Knowing the demands of the target audience, we support the full Babel array API. A particular contribution is that we expose Chapels distributed data types through our interface and make them accessible to external functions implemented in traditional serial programming languages. The advantages of our approach are highlighted by benchmarks that compare the performance of pure Chapel programs with that of hybrid versions that call subroutines implemented in Babel-supported languages inside of parallel loops. We also present our vision for interoperability with other PGAS languages such as UPC and X10.

Explore More