Catalin L. Dumitrescu

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catalin L. Dumitrescu is active.

Explore More

Publication

Featured researches published by Catalin L. Dumitrescu.

conference on high performance computing (supercomputing) | 2005

The Globus Striped GridFTP Framework and Server

William E. Allcock; John Bresnahan; Rajkumar Kettimuthu; Michael Link; Catalin L. Dumitrescu; Ioan Raicu; Ian T. Foster

The GridFTP extensions to the File Transfer Protocol define a general-purpose mechanism for secure, reliable, high-performance data movement. We report here on the Globus striped GridFTP framework, a set of client and server libraries designed to support the construction of data-intensive tools and applications. We describe the design of both this framework and a striped GridFTP server constructed within the framework. We show that this server is faster than other FTP servers in both single-process and striped configurations, achieving, for example, speeds of 27.3 Gbit/s memory-to-memory and 17 Gbit/s disk-to-disk over a 60 millisecond round trip time, 30 Gbit/s network. In another experiment, we show that the server can support 1800 concurrent clients without excessive load. We argue that this combination of performance and modular structure make the Globus GridFTP framework both a good foundation on which to build tools and applications, and a unique testbed for the study of innovative data management techniques and network protocols.

conference on high performance computing (supercomputing) | 2007

Falkon: a Fast and Light-weight tasK executiON framework

Ioan Raicu; Yong Zhao; Catalin L. Dumitrescu; Ian T. Foster; Michael Wilde

To enable the rapid execution of many tasks on compute clusters, we have developed Falkon, a Fast and Light-weight tasK executiON framework. Falkon integrates (1) multi-level scheduling to separate resource acquisition (via, e.g., requests to batch schedulers) from task dispatch, and (2) a streamlined dispatcher. Falkons integration of multi-level scheduling and streamlined dispatchers delivers performance not provided by any other system. We describe Falkon architecture and implementation, and present performance results for both microbenchmarks and applications. Microbenchmarks show that Falkon throughput (487 tasks/sec) and scalability (to 54,000 executors and 2,000,000 tasks processed in just 112 minutes) are one to two orders of magnitude better than other systems used in production Grids. Large-scale astronomy and medical applications executed under Falkon by the Swift parallel programming system achieve up to 90% reduction in end-to-end run time, relative to versions that execute tasks via separate scheduler submissions.

grid computing | 2006

How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications

Alexandru Iosup; Catalin L. Dumitrescu; Dick H. J. Epema; Hui Li; Lex Wolters

The grid computing vision promises to provide the needed platform for a new and more demanding range of applications. For this promise to become true, a number of hurdles, including the design and deployment of adequate resource management and information services, need to be overcome. In this context, understanding the characteristics of real grid workloads is a crucial step for improving the quality of existing grid services, and in guiding the design of new solutions. Towards this goal, in this work we present the characteristics of traces of four real grid environments, namely LCG, Grid3, and TeraGrid, which are among the largest production grids currently deployed, and the DAS, which is a research grid. We focus our analysis on virtual organizations, on users, and on individual jobs characteristics. We further attempt to quantify the evolution and the performance of the grid systems from which our traces originate. Finally, given the scarcity of the information available for analysis purposes, we discuss the requirements of a new format for grid traces, and we propose the establishment of a virtual center for workload-based grid benchmarking data: the grid workloads archive

cluster computing and the grid | 2005

GangSim: a simulator for grid scheduling studies

Catalin L. Dumitrescu; Ian T. Foster

Large distributed grid systems pose new challenges in job scheduling due to complex workload characteristics and system characteristics. Due to the numerous parameters that must be considered and the complex interactions that can occur between different resource allocation policies, analytical modeling of system behavior appears impractical. Thus, we have developed the GangSim simulator to support studies of scheduling strategies in grid environments, with a particular focus on investigations of the interactions between local and community resource allocation policies. The GangSim implementation is derived in part from the Ganglia distributed monitoring framework, an implementation approach that permits mixing of simulated and real grid components. We present examples of the studies that GangSim permits, showing in particular how we can use GangSim to study the behavior of VO schedulers as a function of scheduling policy, resource usage policies, and workloads. We also present the results of experiments conducted on an operational Grid, Grid3, to evaluate GangSims accuracy. These latter studies point to the need for more accurate modeling of various aspects of local site behavior.

conference on high performance computing (supercomputing) | 2005

DI-GRUBER: A Distributed Approach to Grid Resource Brokering

Catalin L. Dumitrescu; Ioan Raicu; Ian T. Foster

Managing usage service level agreements (USLAs) within environments that integrate participants and resources spanning multiple physical institutions is a challenging problem. Maintaining a single unified USLA management decision point over hundreds to thousands of jobs and sites can become a bottleneck in terms of reliability as well as performance. DIGRUBER, an extension to our GRUBER brokering framework, was developed as a distributed grid USLAbased resource broker that allows multiple decision points to coexist and cooperate in real-time. DIGRUBER addresses issues regarding how USLAs can be stored, retrieved, and disseminated efficiently in a large distributed environment. The key question this paper addresses is the scalability and performance of DI-GRUBER in large Grid environments. We conclude that as little as three to five decision points can be sufficient in an environment with 300 sites and 60 VOs, an environment ten times larger than today’s Open Science Grid.

european conference on parallel processing | 2005

GRUBER: a grid resource usage SLA broker

Catalin L. Dumitrescu; Ian T. Foster

Resource sharing within grid collaborations usually implies specific sharing mechanisms at participating sites. Challenging policy issues can arise in such scenarios that integrate participants and resources spanning multiple physical institutions. Resource owners may wish to grant to one or more virtual organizations (VOs) the right to use certain resources subject to local usage policies and service level agreements, and each VO may then wish to use those resources subject to its usage policies. This paper describes GRUBER, an architecture and toolkit for resource usage service level agreement (SLA) specification and enforcement in a grid environment, and a series of experiments on a real grid, Grid3. The proposed mechanism allows resources at individual sites to be shared among multiple user communities.

grid computing | 2004

Usage policy-based CPU sharing in virtual organizations

Catalin L. Dumitrescu; Ian T. Foster

Resource sharing within grid collaborations usually implies specific sharing mechanisms at participating sites. Challenging policy issues can arise within virtual organizations (VOs) that integrate participants and resources spanning multiple physical institutions. Resource owners may wish to grant to one or more VOs the right to use certain resources subject to local policy and service level agreements, and each VO may then wish to use those resources subject to VO policy. Thus, we must address the question of what usage policies (UPs) should be considered for resource sharing in VOs. As a first step in addressing this question, we develop and evaluate different UP scenarios within a specialized context that mimics scientific grids within which the resources to be shared are computers. We also present a UP architecture and define roles and functions for scheduling resources in such grid environments while satisfying resource owner policies.

international conference on service oriented computing | 2004

Connecting client objectives with resource capabilities: an essential component for grid service managent infrastructures

Asit Dan; Catalin L. Dumitrescu; Matei Ripeanu

In large-scale, distributed systs such as Grids, an agreent between a client and a service provider specifies service level objectives both as expressions of client requirents and as provider assurances. Ideally, these objectives are expressed in a high-level, service- or application-specific manner rather than requiring clients to detail the necessary resources. Resource providers on the other hand, expect low-level, resource specific performance criteria that are uniform across applications and can easily be interpreted and provisioned. This paper presents a framework for Grid service managent that addresses this gap between high-level specification of client performance objectives and existing resource managent infrastructures It identifies three levels of abstraction for resource requirents that a service provider needs to manage, namely: detailed specification of raw resources, virtualization of heterogeneous resources as abstract resources, and performance objectives at an application level. The paper also identifies three key functions for managing service level agreents, namely: translation of resource requirents across abstraction layers, arbitration in allocating resources to client requests, and aggregation and allocation of resources from multiple lower level resource managers. One or more of these key functions may be present at each abstraction layer of a service level manager. Thus, the composition of these functions across resource abstraction layers enables modeling of a wide array of managent scenarios. We present a framework that supports these functions: it uses the service metadata and/or service performance models to map client requirents to resource capabilities, it uses business value associated with objectives in allocation decisions to arbitrate between competing requests, and it allocates resources based on previously negotiated agreents.

ieee international workshop on policies for distributed systems and networks | 2005

A model for usage policy-based resource allocation in grids

Catalin L. Dumitrescu; Michael Wilde; Ian T. Foster

Challenging usage policy issues can arise within virtual organizations (VOs) that integrate participants and resources spanning multiple physical institutions. Participants may wish to delegate to one or more VOs the right to use certain resources subject to local policy and service level agreements; each VO then wishes to use those resources subject to VO policy. How are such local and VO policies to be expressed, discovered, interpreted, and enforced? As a first step to addressing these questions, we develop and evaluate policy management solutions within a specialized context, namely scientific data grids within which the resources to be shared are computers and storage. We propose an architecture and recursive policy model, and define roles and functions, for scheduling resources in grid environments while satisfying resource owner and VO policies.

grid and cooperative computing | 2005

Experiences in running workloads over grid3

Catalin L. Dumitrescu; Ioan Raicu; Ian T. Foster

Running workloads in a grid environment is often a challenging problem due the scale of the environment, and to the resource partitioning based on various sharing strategies. A resource may be taken down during a job execution, be improperly setup or just fail job execution. Such elements have to be taken in account whenever targeting a grid environment for execution. In this paper we explore these issues on a real grid, Grid3, by means of a specific workload, the BLAST workload, and a specific scheduling framework, GRUBER – an architecture and toolkit for resource usage service level agreement (SLA) specification and enforcement. The paper provides extensive experimental results. We address in high detail the performance of different site selection strategies of GRUBER and the overall performance in scheduling workloads in Grid3 with workload sizes ranging from 10 to 10,000 jobs.

Explore More