Is this you? Create Your Porfile

Yunhong Gu

University of Illinois at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yunhong Gu is active.

Explore More

Publication

Featured researches published by Yunhong Gu.

Computer Networks | 2007

UDT: UDP-based data transfer for high-speed wide area networks

Yunhong Gu; Robert L. Grossman

In this paper, we summarize our work on the UDT high performance data transport protocol over the past four years. UDT was designed to effectively utilize the rapidly emerging high-speed wide area optical networks. It is built on top of UDP with reliability control and congestion control, which makes it quite easy to install. The congestion control algorithm is the major internal functionality to enable UDT to effectively utilize high bandwidth. Meanwhile, we also implemented a set of APIs to support easy application implementation, including both reliable data streaming and partial reliable messaging. The original UDT library has also been extended to Composable UDT, which can support various congestion control algorithms. We will describe in detail the design and implementation of UDT, the UDT congestion control algorithm, Composable UDT, and the performance evaluation.

knowledge discovery and data mining | 2008

Data mining using high performance data clouds: experimental studies using sector and sphere

Robert L. Grossman; Yunhong Gu

We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it provides the storage services required by the Sphere compute cloud. We also describe the programming paradigm supported by the Sphere compute cloud. Sector and Sphere are designed for analyzing large data sets using computer clusters connected with wide area high performance networks (for example, 10+ Gb/s). We describe a distributed data mining application that we have developed using Sector and Sphere. Finally, we describe some experimental studies comparing Sector/Sphere to Hadoop.

Future Generation Computer Systems | 2009

Compute and storage clouds using wide area high performance networks

Robert L. Grossman; Yunhong Gu; Michael Sabala; Wanzhi Zhang

We describe a cloud-based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure consists of a storage cloud called Sector and a compute cloud called Sphere. We describe two applications that we have built using the cloud and some experimental studies.

Philosophical Transactions of the Royal Society A | 2009

Sector and Sphere: the design and implementation of a high-performance data cloud

Yunhong Gu; Robert L. Grossman

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source.

Journal of Grid Computing | 2003

SABUL: A Transport Protocol for Grid Computing

Yunhong Gu; Robert L. Grossman

This paper describes SABUL, an application-level data transfer protocol for data-intensive applications over high bandwidth-delay product networks. SABUL is designed for reliability, high performance, fairness and stability. It uses UDP to transfer data and TCP to return control messages. A rate-based congestion control that tunes the inter-packet transmission time helps achieve both efficiency and fairness. In order to remove the fairness bias between flows with different network delays, SABUL adjusts its sending rate at uniform intervals, instead of at intervals determined by round trip time. This protocol has demonstrated its efficiency and fairness in both experimental and practical applications. SABUL has been implemented as an open source C++ library, which has been successfully used in several Grid computing applications.

conference on high performance computing (supercomputing) | 2004

Experiences in Design and Implementation of a High Performance Transport Protocol

Yunhong Gu; Xinwei Hong; Robert L. Grossman

This paper describes our experiences in the development of the UDP-based Data Transport (UDT) protocol, an application level transport protocol used in distributed data intensive applications. The new protocol is motivated by the emergence of wide area high-speed optical networks, in which TCP is often found to fail to utilize the abundant bandwidth. UDT demonstrates good efficiency and fairness (including RTT fairness and TCP friendliness) characteristics in high performance computing applications where a small number of bulk sources share the abundant bandwidth. It combines both rate and window control and uses bandwidth estimation to determine the control parameters automatically. This paper presents the rationale behind UDT: how UDT integrates these schemes to support high performance data transfer, why these schemes are used, and what the main issues are in the design and implementation of this high performance transport protocol.

conference on high performance computing (supercomputing) | 2005

Supporting Configurable Congestion Control in Data Transport Services

Yunhong Gu; Robert L. Grossman

As wide area high-speed networks rapidly increase, new applications emerge and require new control mechanisms in data transport services to support them. In this paper, we present UDT/CCC, a data transport library that allows users to make use of a new control algorithm through simple configurations. We aim to provide a tool for fast implementation and deployment, as well as easy evaluation, of new congestion control algorithms. UDT/CCC uses an objected-oriented design. We show that our UDT/CCC library can be used to easily implement a large variety of control algorithms and can simulate the behavior of their native implementations as well. The UDT/CCC library is at the application level and it does not need root privilege to be installed. Meanwhile, it was specially developed to require very few changes to the existing applications. This paper describes its design, implementation, and evaluation.

Future Generation Computer Systems | 2003

Experimental studies using photonic data services at IGrid 2002

Robert L. Grossman; Yunhong Gu; Don Hamelburg; David Hanley; Xinwei Hong; Jorge Levera; Dave Lillethun; Marco Mazzucco; Joe Mambretti; Jeremy Weinberger

We describe an architecture for remote and distributed data intensive applications that integrates optical path services, network protocol services for high performance data transport, and data services for remote data analysis and distributed data mining. We also present experimental evidence using geoscience data that this architecture scales to long haul, high performance networks.

IEEE Transactions on Parallel and Distributed Systems | 2011

Toward Efficient and Simplified Distributed Data Intensive Computing

Yunhong Gu; Robert L. Grossman

While the capability of computing systems has been increasing at Moores Law, the amount of digital data has been increasing even faster. There is a growing need for systems that can manage and analyze very large data sets, preferably on shared-nothing commodity systems due to their low expense. In this paper, we describe the design and implementation of a distributed file system called Sector and an associated programming framework called Sphere that processes the data managed by Sector in parallel. Sphere is designed so that the processing of data can be done in place over the data whenever possible. Sometimes, this is called data locality. We describe the directives Sphere supports to improve data locality. In our experimental studies, the Sector/Sphere system has consistently performed about 2-4 times faster than Hadoop, the most popular system for processing very large data sets.

high performance distributed computing | 2010

An overview of the Open Science Data Cloud

Robert L. Grossman; Yunhong Gu; Joe Mambretti; Michal Sabala; Alexander S. Szalay; Kevin P. White

The Open Science Data Cloud is a distributed cloud based infrastructure for managing, analyzing, archiving and sharing scientific datasets. We introduce the Open Science Data Cloud, give an overview of its architecture, provide an update on its current status, and briefly describe some research areas of relevance.

Explore More