Ling Tony Chen
Lawrence Berkeley National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ling Tony Chen.
acm multimedia | 1994
Brian Tierney; Jason Lee; Ling Tony Chen; Hanan Herzog; Gary Hoo; Guojun Jin; William E. Johnston
We have designed, built, and analyzed a distributed parallel storage system that will supply image streams fast enough to permit multi-user, “real-time”, video-like applications in a wide-area ATM network-based Internet environment. We have based the implementation on user-level code in order to secure portability; we have characterized the performance bottlenecks arising from operating system and hardware issues, and based on this have optimized our design to make the best use of the available performance. Although at this time we have only operated with a few classes of data, the approach appears to be capable of providing a scalable, high-performance, and economical mechanism to provide a data storage system for several classes of data (including mixed multimedia streams), and for applications (clients) that operate in a high-speed network environment.
Information Systems | 1995
Ling Tony Chen; Robert S. Drach; M. Keating; Steven S. Louis; Doron Rotem; Arie Shoshani
Abstract This paper addresses the problem of urgently needed data management techniques for efficiently retrieving requested subsets of large datasets from mass storage devices. This problem is especially critical for scientific investigators who need ready access to the large volume of data generated by large-scale supercomputer simulations and physical experiments as well as the automated collection of observations by monitoring devices and satellites. This problem also negates the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater than the time to transmit that subset over a fast network. This paper focuses on very large spatial and temporal datasets generated by simulation of climate models, but the techniques described here are applicable to any large multidimensional grid data. The main requirement is to efficiently access relevant information contained within much larger datasets for analysis and interactive visualization. Although these problems are now becoming more widely recognized, the problem persists because the access speed of robotic storage devices continues to be the bottleneck. To address this problem, we have developed algorithms for partitioning the original datasets into “clusters” based on analysis of data access patterns and storage device characteristics. Further, we have designed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We describe in this paper the approach we have taken, the partitioning algorithms, and simulation and experimental results that show 1 to 2 orders of magnitude in access improvements for predicted query types. We further describe the design and implementation of improvements to a specific storage management system, UniTree, which are necessary to support the enhanced protocols. In addition, we describe the development of a partitioning workbench to help scientists select the preferred solutions.
conference on high performance computing (supercomputing) | 1994
Brian Tierney; William E. Johnston; Hanan Herzog; Gary Hoo; Guojun Jin; Jason Lee; Ling Tony Chen; Doron Rotem
We describe the design and implementation of a distributed parallel storage system that uses high-speed ATM networks as a key element of the architecture. Other elements include a collection of network-based disk block servers, and an associated name server that provides some file system functionality. The implementation is based on user level software that runs on UNIX workstations. Both the architecture and the implementation are intended to provide for easy and economical scalability. This approach has yielded a data source that scales economically to very high speed. Target applications include online storage for both very large images and video sequences. This paper describes the architecture, and explores the performance issues of the current implementation.<<ETX>>
symposium on principles of database systems | 1994
Ling Tony Chen; Doron Rotem
This work deals with the problem of finding efficient access plans for retrieving a set of pages from a multi-disk system with replicated data. This paper contains two results related to this problem: (a) We solve the problem of finding an optimal access path by transforming it into a network flow problem. We also indicate how our method may be employed in dynamic environments where some (or all) of the disks have a preexisting load, are heterogeneous, and reside on different servers. (b) We present a lower bound for the worst case response time of a request under all replication schemes, and also discuss the replication scheme that results in this lower bound. We then use simulation to show how this replication scheme can also greatly reduce the average case response time.
Goddard conference on mass storage and technolgies, College Park, MD (United States), 28-30 Mar 1995 | 1995
Ling Tony Chen; Doron Rotem; Arie Shoshani; B. Drach; M. Keating; Steven S. Louis
We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and. interactive visualization. We have developed data partitioning techniques that partition datasets into ``clusters`` based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.
extending database technology | 1994
Ling Tony Chen; Doron Rotem
Automated robotic devices that mount and dismount tape cartridges and optical disks are an important component of a mass storage system. Optimizing database performance in such environments poses additional challenges as the response time to a query may involve several costly volume mounts and dismounts in addition to seek distances within a volume. In this paper we analyze some optimization problems concerning placement of data on such devices. We present a dynamic programming algorithm for optimal loading of data on a robotic device to minimize expected query response time. The method is general in the sense that it can be tailored to work for different hardware characteristics such as seek and mounting times. A variant of the method is also presented which achieves optimal response times subject to storage utilization constraints.
very large data bases | 1993
Ling Tony Chen; Doron Rotem
symposium on principles of database systems | 1994
Ling Tony Chen; Doron Rotem
very large data bases | 1995
Ling Tony Chen; Doron Rotem; Sridhar Seshadri
Archive | 1995
Ling Tony Chen; Robert S. Drach; M. Keating; Steven S. Louis; Doron Rotem; Arie Shoshani