Omer Ozan Sonmez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Omer Ozan Sonmez is active.

Explore More

Publication

Featured researches published by Omer Ozan Sonmez.

high performance distributed computing | 2008

The performance of bags-of-tasks in large-scale distributed systems

Alexandru Iosup; Omer Ozan Sonmez; Shanny Anoep; Dick H. J. Epema

Ever more scientists are employing large-scale distributed systems such as grids for their computational work, instead of tightly coupled high-performance computing systems. However, while these distributed systems are more cost-effective, their heterogeneity in terms of hardware, software, and systems administration, and the lack of accurate resource information leads to inefficient scheduling. In addition, and in contrast to the workloads of tightly coupled high-performance computing systems, a large part of the workloads submitted to these distributed systems consists of large sets (bags) of sequential tasks. Therefore, a realistic performance analysis of scheduling bags-of-tasks in large-scale distributed systems is important. Towards this end, we introduce in this paper a realistic workload model for bags-of-tasks, and we explore through trace-based simulations the design space of scheduling bags-of-tasks. Finally, we identify three new scheduling policies that use only inaccurate information when scheduling, and we compare them against known classes of proposed scheduling policies.

european conference on parallel processing | 2007

The characteristics and performance of groups of jobs in grids

Alexandru Iosup; Mathieu Jan; Omer Ozan Sonmez; Dick H. J. Epema

Even though with few exceptions, grid workloads are dominated by single-node jobs, not all of these jobs are necessarily independent or unrelated. For instance, sets of jobs may be grouped because they are submitted by users in batches, e.g., to perform parameter sweeps. However, there is no reported data to confirm the presence and structure of these groupings, despite the large potential impact of such information. To address this lack of information, in this work we present a first investigation into the characteristics of groups of jobs present in grid workloads. First, we define three types of job groupings: batch, continued, and bursty submissions. Then, we analyze the characteristics of these groupings for three long-term traces from currently deployed grid environments. Notably, our results show that the various groupings are responsible for up to 96% of the total CPU time consumption. Finally, we present insights into the performance of real grids in dealing with grouped jobs.

international conference on cluster computing | 2007

Scheduling malleable applications in multicluster systems

Jérémy Buisson; Omer Ozan Sonmez; Hashim H. Mohamed; Wouter Lammers; Dick H. J. Epema

In large-scale distributed execution environments such as multicluster systems and grids, resource availability may vary due to resource failures and because resources may be added to or withdrawn from such environments at any time. In addition, single sites in such systems may have to deal with workloads originating from both local users and from many other sources. As a result, application malleability, that is, the property of applications to deal with a varying amount of resources during their execution, may be very beneficial for performance. In this paper we present the design of the support of and scheduling policies for malleability in our Koala multicluster scheduler with the help of our Dynaco framework for application malleability. In addition, we show the results of experiments with scheduling malleable workloads with Koala in our DAS multicluster testbed.

high performance distributed computing | 2009

Trace-based evaluation of job runtime and queue wait time predictions in grids

Omer Ozan Sonmez; Nezih Yigitbasi; Alexandru Iosup; Dick H. J. Epema

Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that job runtimes and queue wait times are highly variable, which generally reduces system performance and makes grids difficult to use by the common scientist. Predicting job runtimes and queue wait times have been widely studied for parallel environments. However, there is no detailed investigation on how the proposed prediction methods perform in grids, whose resource structure and workload characteristics are very different from those in parallel systems. In this paper, we assess the performance and benefit of predicting job runtimes and queue wait times in grids based on traces gathered from various research and production grid environments. First, we evaluate the performance of simple yet widely used time series prediction methods and the effect of applying them to different types of job classes (e.g., all jobs submitted by single users or to single sites). Then, we investigate the performance of two kinds of queue wait time prediction methods for grids. Last, we investigate whether prediction-based grid-level scheduling policies can have better performance than policies that do not use predictions.

high performance distributed computing | 2010

Performance analysis of dynamic workflow scheduling in multicluster grids

Omer Ozan Sonmez; Nezih Yigitbasi; Saeid Abrishami; Alexandru Iosup; Dick H. J. Epema

Scientists increasingly rely on the execution of workflows in grids to obtain results from complex mixtures of applications. However, the inherently dynamic nature of grid workflow scheduling, stemming from the unavailability of scheduling information and from resource contention among the (multiple) workflows and the non-workflow system load, may lead to poor or unpredictable performance. In this paper we present a comprehensive and realistic investigation of the performance of a wide range of dynamic workflow scheduling policies in multicluster grids. We first introduce a taxonomy of grid workflow scheduling policies that is based on the amount of dynamic information used in the scheduling process, and map to this taxonomy seven such policies across the full spectrum of information use. Then, we analyze the performance of these scheduling policies through simulations and experiments in a real multicluster grid. We find that there is no single grid workflow scheduling policy with good performance across all the investigated scenarios. We also find from our real system experiments that with demanding workloads, the limitations of the head-nodes of the grid clusters may lead to performance loss not expected from the simulation results. We show that task throttling, that is, limiting the per-workflow number of tasks dispatched to the system, prevents the head-nodes from becoming overloaded while largely preserving performance, at least for communication-intensive workflows.

european conference on parallel processing | 2008

DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation

Alexandru Iosup; Omer Ozan Sonmez; Dick H. J. Epema

Many advances in grid resource management are still required to realize the grid computing vision of the integration of a world-wide computing infrastructure for scientific use. The pressure for advances is increased by the fast evolution of single, large clusters, which are the primary technological alternative to grids. However, advances in grid resource management cannot be achieved without an appropriate toolbox, of which simulation environments form an essential part. The current grid simulation environments still lack important workload and system modeling features, and research productivity features such as automated experiment setup and management. In this paper we address these issues through the design and a reference implementation of DGSim , a framework for simulating grid resource management architectures. DGSim introduces the concepts of grid evolution and of job selection policy, and extends towards realism the current body of knowledge on grid inter-operation, on grid dynamics, and on workload modeling. We also show through two real use cases how DGSim can be used to compare grid resource management architectures.

IEEE Transactions on Parallel and Distributed Systems | 2010

On the Benefit of Processor Coallocation in Multicluster Grid Systems

Omer Ozan Sonmez; Hashim H. Mohamed; Dick H. J. Epema

In multicluster grid systems, parallel applications may benefit from processor coallocation, that is, the simultaneous allocation of processors in multiple clusters. Although coallocation allows the allocation of more processors than available in a single cluster, it may severely increase the execution time of applications due to the relatively slow wide-area communication. The aim of this paper is to investigate the benefit of coallocation in multicluster grid systems, despite this drawback. To this end, we have conducted experiments in a real multicluster grid environment, as well as in a simulated environment, and we evaluate the performance of coallocation for various applications that range from computation-intensive to communication-intensive and for various system load settings. In addition, we compare the performance of scheduling policies that are specifically designed for coallocation. We demonstrate that considering latency in the resource selection phase improves the performance of coallocation, especially for communication-intensive parallel applications.

cluster computing and the grid | 2009

Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems

Omer Ozan Sonmez; Bart Grundeken; Hashim H. Mohamed; Alexandru Iosup; Dick H. J. Epema

The use of todays multicluster grids exhibits periods of submission bursts with periods of normal use and even of idleness. To avoid resource contention, many users employ observational scheduling, that is, they postpone the submission of relatively low-priority jobs until a cluster becomes (largely) idle. However, observational scheduling leads to resource contention when several such users crowd the same idle cluster. Moreover, this job execution model either delays the execution of more important jobs, or requires extensive administrative support for job and user priorities. Instead, in this work we investigate the use of cycle scavenging to run jobs on grid resources politely yet efficiently, and with an acceptable administrative cost. We design a two-level cycle scavenging scheduling architecture that runs unobtrusively alongside regular grid scheduling. We equip this scheduler with two novel cycle scavenging scheduling policies that enforce fair resource sharing among competing cycle scavenging users. We show through experiments with real and synthetic applications in a real multicluster grid that the proposed architecture can execute jobs politely yet efficiently.

grid computing | 2011

Performance Evaluation of Overload Control in Multi-cluster Grids

Nezih Yigitbasi; Omer Ozan Sonmez; Alexandru Iosup; Dick H. J. Epema

Multi-cluster grids are widely employed to execute workloads consisting of compute- and data-intensive applications in both research and production environments. Such workloads, especially when they are bursty, may stress shared system resources, to the point where overload conditions occur. Overloads can severely degrade the system performance and responsiveness, potentially causing user dissatisfaction and perhaps even revenue loss. However, the characteristics of multi-cluster grids, such as their complexity and heterogeneity, raise numerous nontrivial issues while controlling overload in such systems. In this work we present an extensive performance evaluation of overload control in multi-cluster grids. We adapt a dynamic throttling mechanism that enforces a concurrency limit indicating the maximum number of tasks running concurrently for every application. Using diverse workloads we evaluate several throttling mechanisms including our dynamic mechanism in our DAS-3 multi-cluster grid. Our results show that throttling can be used for effective overload control in multi-cluster grids, and in particular, that our dynamic technique improves the application performance by as much as 50% while also improving the system responsiveness by up to 80%.

CoreGRID | 2007

Virtual Domain Sharing in e-Science based on Usage Service Level Agreements

Catalin Dumitrescu; Alexandru Iosup; Omer Ozan Sonmez; Hashim H. Mohamed; Dick H. J. Epema

Today’s Grids, Peer-to-Peer infrastructures or any large computing collaborations are managed as individual virtual domains (VDs) that focus on their specific problems. However, the research world is starting to shift towards world-wide collaborations and much bigger problems. For this trend to realize, the already existing collection of many resources and services needs to be shared across owning VDs in secure and efficient ways, and at the least administrative costs. In this paper we identify the requirements for and propose a specific solution based on usage service level agreements (uSLAs) for this problem of VD sharing. Further, we propose an integrated architecture that provides uSLA-based access to resources, supports the recurrent delegation of usage rights, and provides faulttolerant resource co-allocation.

Explore More