Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where George Kola is active.

Publication


Featured researches published by George Kola.


grid computing | 2004

Phoenix: making data-intensive grid applications fault-tolerant

George Kola; Tevfik Kosar; Miron Livny

A major hurdle facing data intensive grid applications is the appropriate handling of failures that occur in the grid-environment. Implementing the fault-tolerance transparently at the grid-middleware level would make different data intensive applications fault-tolerant without each having to pay a separate cost and reduce the time to grid-based solution for many scientific problems. We analyzed the failures encountered by four real-life production data intensive applications: NCSA image processing pipeline, WCER video processing pipeline, US-CMS pipeline and BMRB BLAST pipeline. Taking the result of the analysis into account, we have designed and implemented Phoenix, a transparent middleware-level fault-tolerance layer that detects failures early, classifies failures into transient and permanent and appropriately handIes the transient failures. We applied our fault-tolerance layer to a prototype of the NCSA image processing pipeline and considerably improved the failure handling and report on the insights gained in the process.


Performance Evaluation | 2007

Target bandwidth sharing using endhost measures

George Kola; Mary K. Vernon

Recent congestion control protocols such as XCP and RCP achieve fair bandwidth sharing, high utilization, small queue sizes and nearly zero packet loss by implementing an explicit bandwidth share mechanism in the network routers. This paper develops new quantitative techniques for achieving the same results using only end-host measures. We develop new methods of computing bottleneck link characteristics, a new technique for sharing bandwidth fairly with Reno flows, and a new approach for rapidly converging to bandwidth share. A new transport protocol, TCP-Madison, that employs the new bandwidth sharing techniques is also defined in the paper. Experiments comparing TCP-Madison with FAST TCP, BIC-TCP and TCP-Reno over hundreds of PlanetLab and other live Internet paths show that the new protocol achieves the stated bandwidth sharing properties, is easily configured for near-optimal performance over all paths, and significantly outperforms the previous protocols.


international conference on cluster computing | 2004

A client-centric grid knowledgebase

George Kola; Tevfik Kosar; Miron Livny

Grid computing brings with it additional complexities and unexpected failures. Just keeping track of our jobs traversing different grid resources before completion can at times become tricky. We introduce a client-centric grid knowledgebase that keeps track of the job performance and failure characteristics on different grid resources as observed by the client. We present the design and implementation of our prototype grid knowledgebase and evaluate its effectiveness on two real life grid data processing pipelines: NCSA image processing pipeline and WCER video processing pipeline. It enabled us to easily extract useful job and resource information and interpret them to make better scheduling decisions. Using it, we were able to understand failures better and were able to devise innovative methods to automatically avoid and recover from failures and dynamically adapt to grid environment improving fault-tolerance and performance.


measurement and modeling of computer systems | 2006

QuickProbe: available bandwidth estimation in two roundtrips

George Kola; Mary K. Vernon

An accurate estimate of the available (or unused) bandwidth in a network path can be useful in many applications, including route selection in an overlay or multi-homed network, initial bit-rate selection for video streams or improving the slow start phase of existing TCP protocols. Previously proposed methods for estimating available bandwidth include Pathload [2], PTR [1], Spruce [4], and references therein. The Pathload technique has been shown to be reasonably accurate under a wide range of conditions by independent researchers [3,5]. PTR is more efficient and has been shown to have accuracy similar to Pathload in a more limited set of experiments [1]. With an accurate measure of the bottleneck link capacity, Spruce has been found to be more accurate than Pathload when a new cross traffic stream with known rate is injected. The principal drawback of these techniques is that they require on the order of hundreds of probe packets, and several to several tens of seconds to obtain the available bandwidth estimate. This paper develops a new bandwidth estimation technique, ‘QuickProbe’, that uses 19 probe packets to obtain a conservative estimate of the available bandwidth within a single round trip, and then uses 9-17 further probe packets in each subsequent round trip to refine the estimate. We have compared the QuickProbe and Pathload estimates for hundreds of Internet paths between PlanetLab and other nodes. The paths have measured round-trip times in the range of 20-800 milliseconds, capacities in the range of 0.1 600 Mb/s, and ratios of available bandwidth to capacity in the range of 5-95%. Over such paths, ‘QuickProbe’ obtains conservative available bandwidth estimates after two round trips that are a within a factor of 0.7 1.0 times the Pathload estimate.


parallel computing | 2005

Data placement in widely distributed environments

Tevfik Kosar; Se-Chang Son; George Kola; Miron Livny

The increasing computation and data requirements of scientific applications, especially in the areas of bioinformatics, astronomy, high energy physics, and earth sciences, have necessitated the use of distributed resources owned by collaborating parties. While existing distributed systems work well for compute-intensive applications that require limited data movement, they fail in unexpected ways when the application accesses, creates, and moves large amounts of data over wide-area networks. Existing systems closely couple data movement and computation, and consider data movement as a side effect of computation. In this chapter, we propose a framework that de-couples data movement from computation, allows queuing and scheduling of data movement apart from computation, and acts as an I/O subsystem for distributed systems. This system provides a uniform interface to heterogeneous storage systems and data transfer protocols; permits policy support and higher-level optimization; and enables reliable, efficient scheduling of compute and data resources.


middleware for grid computing | 2004

Data pipelines: enabling large scale multi-protocol data transfers

Tevfik Kosar; George Kola; Miron Livny

Collaborating users need to move terabytes of data among their sites, often involving multiple protocols. This process is very fragile and involves considerable human involvement to deal with failures. In this work, we propose data pipelines, an automated system for transferring data among collaborating sites. It speaks multiple protocols, has sophisticated flow control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to transfer three terabytes of DPOSS data from SRB mass storage server at San Diego Supercomputing Center to UniTree mass storage at NCSA. The whole process did not require any human intervention and the data pipeline recovered automatically from various network, storage system, software and hardware failures.


international symposium on parallel and distributed computing | 2003

A framework for self-optimizing, fault-tolerant, high performance bulk data transfers in a heterogeneous grid environment

Tevfik Kosar; George Kola; Miron Livny

The drastic increase in the data requirements of scientific applications combined with an increasing trend towards collaborative research has resulted in the need to transfer large amounts of data among the participating sites. The general approach to transferring such large amounts of data has been to either dump data to tapes and mail them or employ scripts with an operator at each site to babysit the transfers to deal with failures. We introduce a framework which automates the whole process of data movement between different sites. The framework does not require any human intervention and it can recover automatically from various kinds of storage system, network, and software failures, guaranteeing completion of the transfers. The framework has sophisticated monitoring and tuning capability that increases the performance of the data transfers on the fly. The framework also generates on-the-fly visualization of the transfers making identification of problems and bottlenecks in the system simple.


Concurrency and Computation: Practice and Experience | 2006

Building reliable and efficient data transfer and processing pipelines

Tevfik Kosar; George Kola; Miron Livny

Scientific distributed applications have an increasing need to process and move large amounts of data across wide area networks. Existing systems either closely couple computation and data movement, or they require substantial human involvement during the end‐to‐end process. We propose a framework that enables scientists to build reliable and efficient data transfer and processing pipelines. Our framework provides a universal interface to different data transfer protocols and storage systems. It has sophisticated flow control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to replicate and process three terabytes of DPOSS astronomy image dataset and several terabytes of WCER educational video dataset. In both cases, the entire process was performed without any human intervention and the data pipeline recovered automatically from various failures. Copyright


european conference on parallel processing | 2004

Profiling Grid Data Transfer Protocols and Servers

George Kola; Tevfik Kosar; Miron Livny

The trend of data intensive grid applications has brought grid storage protocols and servers into focus. The objective of this study is to gain an understanding of how time is spent in the storage protocols and servers. The storage protocols have a variety of tuning parameters. Some parameters improve single client performance at the expense of increased server load, thereby limiting the number of served clients. What ultimately matters is the throughput of the whole system. Some parameters increase the flexibility or security of the system at some expense. The objective of this study is to make such trade-offs clear and enable easy full system optimization.


Scalable Computing: Practice and Experience | 2001

Run-time Adaptation of Grid Data Placement Jobs

George Kola; Tevfik Kosar; Miron Livny

Collaboration


Dive into the George Kola's collaboration.

Top Co-Authors

Avatar

Miron Livny

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mary K. Vernon

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Se-Chang Son

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Jaime Frey

University of Wisconsin-Madison

View shared research outputs
Researchain Logo
Decentralizing Knowledge