Humaira Kamal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Humaira Kamal is active.

Explore More

Publication

Featured researches published by Humaira Kamal.

conference on high performance computing (supercomputing) | 2005

SCTP versus TCP for MPI

Humaira Kamal; Brad Penoff; Alan Wagner

SCTP (Stream Control Transmission Protocol) is a recently standardized transport level protocol with several features that better support the communication requirements of parallel applications; these features are not present in traditional TCP (Transmission Control Protocol). These features make SCTP a good candidate as a transport level protocol for MPI (Message Passing Interface). MPI is a message passing middleware that is widely used to parallelize scientific and compute intensive applications. TCP is often used as the transport protocol for MPI in both local area and wide area networks. Prior to this work, SCTP has not been used for MPI. We compared and evaluated the benefits of using SCTP instead of TCP as the underlying transport protocol for MPI. We re-designed LAM-MPI, a public domain version of MPI, to use SCTP.We describe the advantages and disadvantages of using SCTP, the necessary modifications to the MPI middleware to use SCTP, and the performance of SCTP as compared to the stock implementation that uses TCP.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

FG-MPI: Fine-grain MPI for multicore and clusters

Humaira Kamal; Alan Wagner

MPI (Message Passing Interface) has been successfully used in the high performance computing community for years and is the dominant programming model. Current implementations of MPI are coarse-grained, with a single MPI process per processor, however, there is nothing in the MPI specification precluding a finer-grain interpretation of the standard. We have implemented Fine-grain MPI (FG-MPI), a system that allows execution of hundreds and thousands of MPI processes on-chip or communicating between chips inside a cluster.

high performance distributed computing | 2010

Scalability of communicators and groups in MPI

Humaira Kamal; Seyed M. Mirtaheri; Alan Wagner

As the number of cores inside compute clusters continues to grow, the scalability of MPI (Message Passing Interface) is important to ensure that programs can continue to execute on an ever-increasing number of cores. One important scalability issue for MPI is the implementation of communicators and groups. Communicators and groups are an integral part of MPI and play an essential role in the design and use of libraries. It is challenging to create an MPI implementation to support communicators and groups to scale to the hundreds of thousands of processes that are possible in todays clusters. In this paper we present the design and evaluation of techniques to support the scalability of communicators and groups in MPI. We have designed and implemented a fine-grain version of MPI (FG-MPI) based on MPICH2, that allows thousands of full-fledged MPI processes inside an operating system process. Using FG-MPI we can create hundreds and thousands of MPI processes, which allowed us to implement and evaluate solutions to the scalability issues associated with communicators. We describe techniques to allow for sharing of group information inside processes, and the design of scalable operations to create the communicators. A set plus permutation framework is introduced for storing group information for communicators and a set, instead of map, representation is proposed for MPI group objects. Performance results are given for the execution of a MPI benchmark program with upwards of 100,000 processes with communicators created for various groups of different sizes and types.

Computing | 2014

An integrated fine-grain runtime system for MPI

Humaira Kamal; Alan Wagner

Fine-grain MPI (FG-MPI) extends the execution model of MPI to allow for interleaved execution of multiple concurrent MPI processes inside an OS-process. It provides a runtime that is integrated into the MPICH2 middleware and uses light-weight coroutines to implement an MPI-aware scheduler. In this paper we describe the FG-MPI runtime system and discuss the main design issues in its implementation. FG-MPI enables expression of function-level parallelism, which along with a runtime scheduler, can be used to simplify MPI programming and achieve performance without adding complexity to the program. As an example, we use FG-MPI to re-structure a typical use of non-blocking communication and show that the integrated scheduler relieves the programmer from scheduling computation and communication inside the application and brings the performance part outside of the program specification into the runtime.

international conference on parallel processing | 2012

Added Concurrency to Improve MPI Performance on Multicore

Humaira Kamal; Alan Wagner

MPI implementations typically equate an MPI process with an OS-process, resulting in a coarse-grain programming model where MPI processes are bound to the physical cores. Fine-Grain (FG-MPI) extends the MPICH2 implementation of MPI and implements an integrated runtime system to allow multiple MPI processes to execute concurrently inside an OS-process. FG-MPIs integrated approach makes it possible to add more concurrency than available parallelism, while minimizing the overheads related to context switches, scheduling and synchronization. In this paper we evaluate the benefits of added concurrency for cache awareness and message size and show that performance gains are possible by using FG-MPI to adjust the grain-size of a program to better fit the cache and potential advantages in passing smaller versus larger messages. We evaluate the use of FG-MPI on the complete set of the NAS parallel benchmarks over large problem sizes, where we show significant performance improvement (20%-30%) for three of the eight benchmarks. We discuss the characteristics of the benchmarks with regards to trade-offs between the added costs and benefits.

conference on communication networks and services research | 2005

SCTP-based middleware for MPI in wide-area networks

Humaira Kamal; Brad Penoff; Alan Wagner

SCTP (stream control transmission protocol) is a recently standardized transport level protocol that has several features not present in TCP. These features make SCTP a better transport level protocol to support MPI (message passing interface). MPI is a message passing library that is widely used to parallelize scientific and compute intensive programs. Recently there has been interest in porting MPI programs to execute in a wide area network. We evaluated the use of SCTP and designed and modified a public domain version of the MPI middleware to use SCTP. We describe the advantages and disadvantages of SCTP and describe the changes that were necessary to the MPI middleware.

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface | 2012

An integrated runtime scheduler for MPI

Humaira Kamal; Alan Wagner

Fine-Grain MPI (FG-MPI) supports function-level parallelism while staying within the MPI process model. It provides a runtime that is directly integrated into the MPICH2 middleware and uses light-weight coroutines to implement an MPI-aware scheduler. Our key observation is that having multiple MPI processes per OS-process, with a runtime scheduler can be used to simplify MPI programming and achieve performance without adding complexity to the program. The performance part of the program is now outside of the specification of the program in the runtime where performance can be tuned with few, if any, changes to the code.

Journal of Parallel and Distributed Computing | 2010

Employing transport layer multi-railing in cluster networks

Brad Penoff; Humaira Kamal; Alan Wagner; Mike Tsai; Karol Mroz; Janardhan R. Iyengar

Building clusters from commodity off-the-shelf parts is a well-established technique for building inexpensive medium- to large-size computing clusters. Many commodity mid-range motherboards come with multiple Gigabit Ethernet interfaces, and the low cost per port for Gigabit Ethernet makes switches inexpensive as well. Our objective in this work is to take advantage of multiple inexpensive Gigabit network cards and Ethernet switches to enhance the communication and reliability performance of a cluster. Unlike previous approaches that take advantage of multiple network connections for multi-railing, we consider CMT (Concurrent Multipath Transfer) that extends SCTP (Stream Control Transmission Protocol), a transport protocol developed by the IETF, to make use of the multiple paths that exist between two hosts. In this work, we explore the applicability of CMT in the transport layer of the network stack to high-performance computing environments. We develop SCTP-based MPI (Message Passing Interface) middleware for MPICH2 and Open MPI, and evaluate the reliability and communication performance of the system. Using Open MPI with support for message striping over multiple paths at the middleware level, we compare the differences in supporting multi-railing in the middleware versus at the transport layer.

high performance distributed computing | 2014

A scalable distributed skip list for range queries

Sarwar Alam; Humaira Kamal; Alan Wagner

In this paper we present a distributed, message passing implementation of a dynamic dictionary structure for range queries. The structure is based on a distributed fine-grain implementation of skip lists that can scale across a cluster of multicore machines. Our implementation makes use of the unique features of Fine-Grain MPI and introduces novel algorithms and techniques to achieve scalable performance on a cluster of multicore machines. Unlike concurrent data structures the distributed skip list operations are deterministic and atomic. Range-queries are implemented in a way that parallelizes the operation and takes advantage of the recursive properties of the skip list structure. We report on the performance of the skip list for range-queries, on a medium sized cluster with two hundred cores.

Archive | 2005

SCTP-Based Middleware for MPI

Humaira Kamal

SCTP (Stream Control Transmission Protocol) is a recently standardized transport level protocol with several features that better support the communication requirements of parallel applications; these features are not present in traditional TCP (Transmission Control Protocol). These features make SCTP a good candidate as a transport level protocol for MPI (Message Passing Interface). MPI is a message passing middleware that is widely used to parallelize scientific and compute intensive applications. TCP is often used as the transport protocol for MPI in both local area and wide-area networks. Prior to this work, SCTP has not been used for MPI. In this thesis, we compared and evaluated the benefits of using SCTP instead of TCP as the underlying transport protocol for MPI. We redesigned LAM-MPI, a public domain version of MPI, to use SCTP. We describe the advantages and disadvantages of using SCTP, the necessary modifications to the MPI middleware to use SCTP, and the performance of SCTP as compared to the stock implementation that uses TCP.

Explore More