George Kola
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George Kola.
grid computing | 2004
George Kola; Tevfik Kosar; Miron Livny
A major hurdle facing data intensive grid applications is the appropriate handling of failures that occur in the grid-environment. Implementing the fault-tolerance transparently at the grid-middleware level would make different data intensive applications fault-tolerant without each having to pay a separate cost and reduce the time to grid-based solution for many scientific problems. We analyzed the failures encountered by four real-life production data intensive applications: NCSA image processing pipeline, WCER video processing pipeline, US-CMS pipeline and BMRB BLAST pipeline. Taking the result of the analysis into account, we have designed and implemented Phoenix, a transparent middleware-level fault-tolerance layer that detects failures early, classifies failures into transient and permanent and appropriately handIes the transient failures. We applied our fault-tolerance layer to a prototype of the NCSA image processing pipeline and considerably improved the failure handling and report on the insights gained in the process.
Performance Evaluation | 2007
George Kola; Mary K. Vernon
Recent congestion control protocols such as XCP and RCP achieve fair bandwidth sharing, high utilization, small queue sizes and nearly zero packet loss by implementing an explicit bandwidth share mechanism in the network routers. This paper develops new quantitative techniques for achieving the same results using only end-host measures. We develop new methods of computing bottleneck link characteristics, a new technique for sharing bandwidth fairly with Reno flows, and a new approach for rapidly converging to bandwidth share. A new transport protocol, TCP-Madison, that employs the new bandwidth sharing techniques is also defined in the paper. Experiments comparing TCP-Madison with FAST TCP, BIC-TCP and TCP-Reno over hundreds of PlanetLab and other live Internet paths show that the new protocol achieves the stated bandwidth sharing properties, is easily configured for near-optimal performance over all paths, and significantly outperforms the previous protocols.
international conference on cluster computing | 2004
George Kola; Tevfik Kosar; Miron Livny
Grid computing brings with it additional complexities and unexpected failures. Just keeping track of our jobs traversing different grid resources before completion can at times become tricky. We introduce a client-centric grid knowledgebase that keeps track of the job performance and failure characteristics on different grid resources as observed by the client. We present the design and implementation of our prototype grid knowledgebase and evaluate its effectiveness on two real life grid data processing pipelines: NCSA image processing pipeline and WCER video processing pipeline. It enabled us to easily extract useful job and resource information and interpret them to make better scheduling decisions. Using it, we were able to understand failures better and were able to devise innovative methods to automatically avoid and recover from failures and dynamically adapt to grid environment improving fault-tolerance and performance.
measurement and modeling of computer systems | 2006
George Kola; Mary K. Vernon
An accurate estimate of the available (or unused) bandwidth in a network path can be useful in many applications, including route selection in an overlay or multi-homed network, initial bit-rate selection for video streams or improving the slow start phase of existing TCP protocols. Previously proposed methods for estimating available bandwidth include Pathload [2], PTR [1], Spruce [4], and references therein. The Pathload technique has been shown to be reasonably accurate under a wide range of conditions by independent researchers [3,5]. PTR is more efficient and has been shown to have accuracy similar to Pathload in a more limited set of experiments [1]. With an accurate measure of the bottleneck link capacity, Spruce has been found to be more accurate than Pathload when a new cross traffic stream with known rate is injected. The principal drawback of these techniques is that they require on the order of hundreds of probe packets, and several to several tens of seconds to obtain the available bandwidth estimate. This paper develops a new bandwidth estimation technique, ‘QuickProbe’, that uses 19 probe packets to obtain a conservative estimate of the available bandwidth within a single round trip, and then uses 9-17 further probe packets in each subsequent round trip to refine the estimate. We have compared the QuickProbe and Pathload estimates for hundreds of Internet paths between PlanetLab and other nodes. The paths have measured round-trip times in the range of 20-800 milliseconds, capacities in the range of 0.1 600 Mb/s, and ratios of available bandwidth to capacity in the range of 5-95%. Over such paths, ‘QuickProbe’ obtains conservative available bandwidth estimates after two round trips that are a within a factor of 0.7 1.0 times the Pathload estimate.
parallel computing | 2005
Tevfik Kosar; Se-Chang Son; George Kola; Miron Livny
The increasing computation and data requirements of scientific applications, especially in the areas of bioinformatics, astronomy, high energy physics, and earth sciences, have necessitated the use of distributed resources owned by collaborating parties. While existing distributed systems work well for compute-intensive applications that require limited data movement, they fail in unexpected ways when the application accesses, creates, and moves large amounts of data over wide-area networks. Existing systems closely couple data movement and computation, and consider data movement as a side effect of computation. In this chapter, we propose a framework that de-couples data movement from computation, allows queuing and scheduling of data movement apart from computation, and acts as an I/O subsystem for distributed systems. This system provides a uniform interface to heterogeneous storage systems and data transfer protocols; permits policy support and higher-level optimization; and enables reliable, efficient scheduling of compute and data resources.
middleware for grid computing | 2004
Tevfik Kosar; George Kola; Miron Livny
Collaborating users need to move terabytes of data among their sites, often involving multiple protocols. This process is very fragile and involves considerable human involvement to deal with failures. In this work, we propose data pipelines, an automated system for transferring data among collaborating sites. It speaks multiple protocols, has sophisticated flow control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to transfer three terabytes of DPOSS data from SRB mass storage server at San Diego Supercomputing Center to UniTree mass storage at NCSA. The whole process did not require any human intervention and the data pipeline recovered automatically from various network, storage system, software and hardware failures.
international symposium on parallel and distributed computing | 2003
Tevfik Kosar; George Kola; Miron Livny
The drastic increase in the data requirements of scientific applications combined with an increasing trend towards collaborative research has resulted in the need to transfer large amounts of data among the participating sites. The general approach to transferring such large amounts of data has been to either dump data to tapes and mail them or employ scripts with an operator at each site to babysit the transfers to deal with failures. We introduce a framework which automates the whole process of data movement between different sites. The framework does not require any human intervention and it can recover automatically from various kinds of storage system, network, and software failures, guaranteeing completion of the transfers. The framework has sophisticated monitoring and tuning capability that increases the performance of the data transfers on the fly. The framework also generates on-the-fly visualization of the transfers making identification of problems and bottlenecks in the system simple.
Concurrency and Computation: Practice and Experience | 2006
Tevfik Kosar; George Kola; Miron Livny
Scientific distributed applications have an increasing need to process and move large amounts of data across wide area networks. Existing systems either closely couple computation and data movement, or they require substantial human involvement during the end‐to‐end process. We propose a framework that enables scientists to build reliable and efficient data transfer and processing pipelines. Our framework provides a universal interface to different data transfer protocols and storage systems. It has sophisticated flow control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to replicate and process three terabytes of DPOSS astronomy image dataset and several terabytes of WCER educational video dataset. In both cases, the entire process was performed without any human intervention and the data pipeline recovered automatically from various failures. Copyright
european conference on parallel processing | 2004
George Kola; Tevfik Kosar; Miron Livny
The trend of data intensive grid applications has brought grid storage protocols and servers into focus. The objective of this study is to gain an understanding of how time is spent in the storage protocols and servers. The storage protocols have a variety of tuning parameters. Some parameters improve single client performance at the expense of increased server load, thereby limiting the number of served clients. What ultimately matters is the throughput of the whole system. Some parameters increase the flexibility or security of the system at some expense. The objective of this study is to make such trade-offs clear and enable easy full system optimization.
Scalable Computing: Practice and Experience | 2001
George Kola; Tevfik Kosar; Miron Livny