Is this you? Create Your Porfile

Karl Czajkowski

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karl Czajkowski is active.

Explore More

Publication

Featured researches published by Karl Czajkowski.

high performance distributed computing | 2001

Grid information services for distributed resource sharing

Karl Czajkowski; Steven Fitzgerald; Ian T. Foster; Carl Kesselman

Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity; large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Consequently, information services are a vital part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and hence for planning and adapting application behavior. We present an information services architecture that addresses performance, security, scalability, and robustness requirements. Our architecture defines simple low-level enquiry and registration protocols that make it easy to incorporate individual entities into various information structures, such as aggregate directories that support a variety of different query languages and discovery strategies. These protocols can also be combined with other Grid protocols to construct additional higher-level services and capabilities such as brokering, monitoring, fault detection, and troubleshooting. Our architecture has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has been widely deployed and applied.

job scheduling strategies for parallel processing | 1998

A Resource Management Architecture for Metacomputing Systems

Karl Czajkowski; Ian T. Foster; Nicholas T. Karonis; Carl Kesselman; Stuart Martin; Warren Smith; Steven Tuecke

Metacomputing systems are intended to support remote and/or concurrent use of geographically distributed computational resources. Resource management in such systems is complicated by five concerns that do not typically arise in other situations: site autonomy and heterogeneous substrates at the resources, and application requirements for policy extensibility, co-allocation, and online control. We describe a resource management architecture that addresses these concerns. This architecture distributes the resource management problem among distinct local manager, resource broker, and resource co-allocator components and defines an extensible resource specification language to exchange information about requirements. We describe how these techniques have been implemented in the context of the Globus metacomputing toolkit and used to implement a variety of different resource management strategies. We report on our experiences applying our techniques in a large testbed, GUSTO, incorporating 15 sites, 330 computers, and 3600 processors.

job scheduling strategies for parallel processing | 2002

SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource Management in Distributed Systems

Karl Czajkowski; Ian T. Foster; Carl Kesselman; Volker Sander; Steven Tuecke

A fundamental problem in distributed computing is to map activities such as computation or data transfer onto resources that meet requirements for performance, cost, security, or other quality of service metrics. The creation of such mappings requires negotiation among application and resources to discover, reserve, acquire, configure, and monitor resources. Current resource management approaches tend to specialize for specific resource classes, and address coordination across resources only in a limited fashion. We present a new approach that overcomes these difficulties.We define a resource management model that distinguishes three kinds of resource-independent service level agreements (SLAs), formalizingag reements to deliver capability, perform activities, and bind activities to capabilities, respectively. We also define a Service Negotiation and Acquisition Protocol (SNAP) that supports reliable management of remote SLAs. Finally, we explain how SNAP can be deployed within the context of the Globus Toolkit.

Proceedings of the IEEE | 2005

Modeling and Managing State in Distributed Systems: The Role of OGSI and WSRF

Ian T. Foster; Karl Czajkowski; Donald F. Ferguson; Jeffrey A. Frey; Steve Graham; Tom Maguire; David Snelling; Steven Tuecke

We often encounter in distributed systems the need to model, access, and manage state. This state may be, for example, data in a purchase order, service level agreements representing resource availability, or the current load on a computer. We introduce two closely related approaches to modeling and manipulating state within a Web services (WS) framework: the Open Grid Services Infrastructure (OGSI) and WS-Resource Framework (WSRF). Both approaches define conventions on the use of the Web service definition language schema that enable the modeling and management of state. OGSI introduces the idea of a stateful Web service and defines approaches for creating, naming, and managing the lifetime of instances of services; for declaring and inspecting service state data; for asynchronous notification of service state change; for representing and managing collections of service instances; and for common handling of service invocation faults. WSRF refactors and evolves OGSI to exploit new Web services standards, specifically WS-addressing, and to respond to early implementation and application experiences. WSRF retains essentially all of the functional capabilities present in OGSI, while changing some syntax (e.g., to exploit WS-addressing) and also adopting a different terminology in its presentation. In addition, WSRF partitions OGSI functionality into five distinct composable specifications. We explain the relationship between OGSI and WSRF and the related WS-notification specifications, explain the common requirements that both address, and compare and contrast the approaches taken to the realization of those requirements.

adaptive agents and multi-agents systems | 2004

Resource Allocation in the Grid Using Reinforcement Learning

Aram Galstyan; Karl Czajkowski; Kristina Lerman

In this paper we study a minimalist decentralized algorithm for resource allocation in a simplified Grid-like environment. We consider a system consisting of large number of heterogenous reinforcement learning agents that share common resources for their computational needs. There is no communication between the agents: the only information that agents receive is the (expected) completion time of a job it submitted to a particular resource and which serves as a reinforcement signal for the agent. The results of our experiments suggest that reinforcement learning can be used to improve the quality of resource allocation in large scale heterogenous system.

Proceedings of the IEEE | 2005

Agreement-Based Resource Management

Karl Czajkowski; Ian T. Foster; Carl Kesselman

One of the criteria for the Grid infrastructure is the ability to share resources with nontrivial qualities of service. However, sharing resources in Grids is complicated in that is requires the ability bridge the differing policy requirements of the resource owners to create a consistent cross-organizational policy domain that delivers the necessary capability to the end user while respecting the policy requirements of the resource owner. Further complicating the management of Grid resources is the need to coordinate resource usage, the diversity of resource types and the variety of different management modes that may be used. We present a unifying resource management framework in which we can address these issues. The fundamental underlying concept in this framework is the representation of various resource management activities in terms of an agreement. Agreements abstract local management policy by representing an underlying resource strictly in terms of policy terms which it is willing to assert, and in doing so provides the basis for building a variety of alternative Grid resource management strategies. We introduce the concepts of agreement based resource management. We present a general agreement model and examine current resource management systems in the context of this model. We then discuss how agreement based resource management is being used as the basis for standards activities and next generation resource management services.

Journal of Grid Computing | 2005

Resource Allocation in the Grid with Learning Agents

Aram Galstyan; Karl Czajkowski; Kristina Lerman

One of the main challenges in Grid computing is efficient allocation of resources (CPU – hours, network bandwidth, etc.) to the tasks submitted by users. Due to the lack of centralized control and the dynamic/stochastic nature of resource availability, any successful allocation mechanism should be highly distributed and robust to the changes in the Grid environment. Moreover, it is desirable to have an allocation mechanism that does not rely on the availability of coherent global information. In this paper we examine a simple algorithm for distributed resource allocation in a simplified Grid-like environment that meets the above requirements. Our system consists of a large number of heterogenous reinforcement learning agents that share common resources for their computational needs. There is no explicit communication or interaction between the agents: the only information that agents receive is the expected response time of a job it submitted to a particular resource, which serves as a reinforcement signal for the agent. The results of our experiments suggest that even simple reinforcement learning can indeed be used to achieve load balanced resource allocation in large scale heterogenous system.

The Grid 2 (2)#R##N#Blueprint for a New Computing Infrastructure | 2004

Chapter 18 – Resource and Service Management

Karl Czajkowski; Ian T. Foster; Carl Kesselman

Publisher Summary This chapter focuses on the management of Grid resources and services. It introduces a generalized resource management framework and uses it as a basis for characterizing existing approaches and for defining a direction for resource management development, particularly as framed within the Open Grid Services Architecture. Although there are many facets to acquiring capabilities for a Grid application, the term “resource management” is used to describe all aspects of the process: locating a capability, arranging for its use, utilizing it, and monitoring its state. Resource management in traditional computing systems is a well-studied problem. Resource managers exist for many computing environments and include batch schedulers, workflow engines, and operating systems. These systems are local, have complete control of a resource, and thus, can implement the mechanisms and policies needed for effective use of that resource in isolation. The core goal of resource management is to establish a mutual agreement between a resource provider and a resource consumer by which the provider agrees to supply a capability that can be used to perform some task on behalf of the consumer.

high performance distributed computing | 2001

Practical resource management for grid-based visual exploration

Karl Czajkowski; Alper K. Demir; Carl Kesselman; Marcus Thiebaux

Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments.

Archive | 2004

Grid Service Level Agreements

Karl Czajkowski; Ian T. Foster; Carl Kesselman; Steven Tuecke

We present a reformulation of the well-known GRAM architecture based on the Service-Level Agreement (SLA) negotiation protocols defined within the Service Negotiation and Access Protocol (SNAP) framework. We illustrate how a range of local, distributed, and workflow scheduling mechanisms can be viewed as part of a cohesive yet open system, in which new scheduling strategies and management policies can evolve without disrupting the infrastructure. This architecture remains neutral to, and in fact strives to mediate, the potentially conflicting resource, community, and user policies.

Explore More