Umit Rencuzogullari
University of Rochester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Umit Rencuzogullari.
acm sigplan symposium on principles and practice of parallel programming | 2001
Umit Rencuzogullari; Sandhya Dwardadas
Networks of workstations (NOWs), which are generally composed of autonomous compute elements networked together, are an attractive parallel computing platform since they offer high performance at low cost. The autonomous nature of the environment, however, often results in inefficient utilization due to load imbalances caused by three primary factors: 1) unequal load (compute or communication) assignment to equally-powerful compute nodes, 2) unequal resources at compute nodes, and 3) multiprogramming. These load imbalances result in idle waiting time on cooperating processes that need to synchronize or communicate data. Additional waiting time may result due to local scheduling decisions in a multiprogrammed environment. In this paper, we present a combined approach of compile-time analysis, run-time load distribution, and operating system scheduler cooperation for improved utilization of available resources in an autonomous NOW. The techniques we propose allow efficient resource utilization by taking into consideration all three causes of load imbalance in addition to locality of access in the process of load distribution. The resulting adaptive load distribution and cooperative scheduling system allows applications to take advantage of parallel resources when available by providing better performance than when the loaded resources are not used at all.
high performance computer architecture | 2000
Robert J. Stets; Sandhya Dwarkadas; Leonidas I. Kontothanassis; Umit Rencuzogullari; Michael L. Scott
Emerging system-area networks provide a variety of features that can dramatically reduce network communication overhead. In this paper, we evaluate the impact of such features on the implementation of Software Distributed Shared Memory (SDSM), and on the Cashmere system in particular. Cashmere has been implemented on the Compaq Memory Channel network, which supports low-latency messages, protected remote memory writes, in-expensive broadcast, and total ordering of network packets. Our evaluation is based on several Cashmere protocol variants, ranging from a protocol that fully leverages the Memory Channels special features to one that uses the network only for fast messaging. We find that the special features improve performance by 18-44% for three of our applications, but less than 12% for our other seven applications. We also find that home node migration, an optimization available only in the message-based protocol, can improve performance by as much as 67%. These results suggest that for systems of modest size, low latency is much more important for SDSM performance than are remote writes, broadcast, or total ordering. At the same time, results on an emulated 32-node system indicate that broadcast based on remote writes of widely-shared data may improve performance by up to 51% for some applications. If hardware broadcast or multicast facilities can be made to scale, they can be beneficial in future system-area networks.
Scientific Programming | 1999
Sotiris Ioannidis; Umit Rencuzogullari; Robert J. Stets; Sandhya Dwarkadas
Clusters of workstations provide a cost-effective, high performance parallel computing environment. These environments, however, are often shared by multiple users, or may consist of heterogeneous machines. As a result, parallel applications executing in these environments must operate despite unequal computational resources. For maximum performance, applications should automatically adapt execution to maximize use of the available resources. Ideally, this adaptation should be transparent to the application programmer. In this paper, we present CRAUL (Compiler and Run-Time Integration for Adaptation Under Load), a system that dynamically balances computational load in a parallel application. Our target run-time is software-based distributed shared memory (SDSM). SDSM is a good target for parallelizing compilers since it reduces compile-time complexity by providing data caching and other support for dynamic load balancing. CRAUL combines compile-time support to identify data access patterns with a run-time system that uses the access information to intelligently distribute the parallel workload in loop-based programs. The distribution is chosen according to the relative power of the processors and so as to minimize SDSM overhead and maximize locality. We have evaluated the resulting load distribution in the presence of different types of load - computational, computational and memory intensive, and network load. CRAUL performs within 5-23% of ideal in the presence of load, and is able to improve on naive compiler-based work distribution that does not take locality into account even in the absence of load.Clusters of workstations provide a cost-effective, high performance parallel computing environment. These environments, however, are often shared by multiple users, or may consist of heterogeneous machines. As a result, parallel applications executing in these environments must operate despite unequal computational resources. For maximum performance, applications should automatically adapt execution to maximize use of the available resources. Ideally, this adaptation should be transparent to the application programmer. In this paper, we present CRAUL (Compiler and Run-Time Integration for Adaptation Under Load), a system that dynamically balances computational load in a parallel application. Our target run-time is software-based distributed shared memory (SDSM). SDSM is a good target for parallelizing compilers since it reduces compile-time complexity by providing data caching and other support for dynamic load balancing. CRAUL combines compile-time support to identify data access patterns with a run-time system that uses the access information to intelligently distribute the parallel workload in loop-based programs. The distribution is chosen according to the relative power of the processors and so as to minimize SDSM overhead and maximize locality. We have evaluated the resulting load distribution in the presence of different types of load – computational, computational and memory intensive, and network load. CRAUL performs within 5–23% of ideal in the presence of load, and is able to improve on naive compiler-based work distribution that does not take locality into account even in the absence of load.
international conference on parallel processing | 2002
Umit Rencuzogullari; Sandhya Dwarkadas
Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.
Archive | 2009
Alok Kumar Gupta; Minwen Ji; Timothy Mann; Tahir Mobashir; Umit Rencuzogullari; Ganesha Shanmuganathan; Limin Wang; Anne Holler
Archive | 2010
Xianan Zhang; Eddie Ma; Umit Rencuzogullari; Irfan Ahmad; Orran Krieger; Mukil Kesavan
ACM Transactions on Computer Systems | 2005
Leonidas I. Kontothanassis; Robert J. Stets; Galen C. Hunt; Umit Rencuzogullari; Gautam Altekar; Sandhya Dwarkadas; Michael L. Scott
Operating Systems Review | 2000
Michael L. Scott; Sandhya Dwarkadas; Srinivasan Parthasarthy; Rajeev Balasubramonian; DeQing Chen; Grigorios Magklis; Athanasios E. Papathanasiou; Eduardo Pinheiro; Umit Rencuzogullari; Chunquiang Tang
Archive | 1999
Robert J. Stets; DeQing Chen; Sandhya Dwarkadas; Nikolaos Hardavellas; Galen C. Hunt; Leonidas I. Kontothanassis; Grigorios Magklis; Srinivasan Parthasarathy; Umit Rencuzogullari; Michael L. Scott
Dynamic resource management for parallel applications in an autonomous cluster of workstations | 2004
Umit Rencuzogullari; Sandhya Dwarkadas