Xuechen Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xuechen Zhang is active.

Explore More

Publication

Featured researches published by Xuechen Zhang.

ieee international conference on high performance computing data and analytics | 2010

IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination

Xuechen Zhang; Kei Davis; Song Jiang

A cluster of data servers and a parallel file system are often used to provide high-throughput I/O service to parallel programs running on a compute cluster. To exploit I/O parallelism parallel file systems stripe file data across the data servers. While this practice is effective in serving asynchronous requests, it may break individual programs spatial locality, which can seriously degrade I/O performance when the data servers concurrently serve synchronous requests from multiple I/O-intensive programs. In this paper we propose a scheme, IOrchestrator, to improve I/O performance of multi-node storage systems by orchestrating I/O services among programs when such inter-data-server coordination is dynamically determined to be cost effective. We have implemented IOrchestrator in the PVFS2 parallel file system. Our experiments with representative parallel benchmarks show that IOrchestrator can significantly improve I/O performance-- by up to a factor of 2.5--delivered by a cluster of data servers servicing concurrently-running parallel programs. Notably, we have not observed any scenarios in which the use of IOrchestrator causes substantial performance degradation.

international parallel and distributed processing symposium | 2012

iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O

Xuechen Zhang; Kei Davis; Song Jiang

The parallel data accesses inherent to large-scale data-intensive scientific computing require that data servers handle very high I/O concurrency. Concurrent requests from different processes or programs to hard disk can cause disk head thrashing between different disk regions, resulting in unacceptably low I/O performance. Current storage systems either rely on the disk scheduler at each data server, or use SSD as storage, to minimize this negative performance effect. However, the ability of the scheduler to alleviate this problem by scheduling requests in memory is limited by concerns such as long disk access times, and potential loss of dirty data with system failure. Meanwhile, SSD is too expensive to be widely used as the major storage device in the HPC environment. We propose iTransformer, a scheme that employs a small SSD to schedule requests for the data on disk. Being less space constrained than with more expensive DRAM, iTransformer can buffer larger amounts of dirty data before writing it back to the disk, or prefetch a larger volume of data in a batch into the SSD. In both cases high disk efficiency can be maintained even for concurrent requests. Furthermore, the scheme allows the scheduling of requests in the background to hide the cost of random disk access behind serving process requests. Finally, as a non-volatile memory, concerns about the quantity of dirty data are obviated. We have implemented iTransformer in the Linux kernel and tested it on a large cluster running PVFS2. Our experiments show that iTransformer can improve the I/O throughput of the cluster by 35% on average for MPI/IO benchmarks of various data access patterns.

international parallel and distributed processing symposium | 2009

Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems

Xuechen Zhang; Song Jiang; Kei Davis

Collective I/O is a widely used technique to improve I/O performance in parallel computing. It can be implemented as a client-based or as a server-based scheme. The client-based implementation is more widely adopted in the MPIIO software such as ROMIO because of its independence from the storage system configuration and its greater portability. However, existing implementations of client-side collective I/O do not consider the actual pattern of file striping over multiple I/O nodes in the storage system. This can cause a large number of requests for non-sequential data at I/O nodes, substantially degrading I/O performance. Investigating a surprisingly high I/O throughput achieved when there is an accidental match between a particular request pattern and the data striping pattern on the I/O nodes, we reveal the resonance phenomenon as the cause. Exploiting readily available information on data striping from the metadata server in popular file systems such as PVFS2 and Lustre, we design a new collective I/O implementation technique, named as resonant I/O, that makes resonance a common case. Resonant I/O rearranges requests from multiple MPI processes according to the presumed data layout on the disks of I/O nodes so that non-sequential access of disk data can be turned into sequential access, significantly improving I/O performance without compromising the independence of a client-based implementation. We have implemented our design in ROMIO. Our experimental results on a small- and medium-scale cluster show that the scheme can increase I/O throughput for some commonly used parallel I/O benchmarks such as mpi-io-test and ior-mpi-io over the existing implementation of ROMIO by up to 157%, with no scenario demonstrating significantly decreased performance.

cluster computing and the grid | 2014

Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics

Jai Dayal; Drew Bratcher; Greg Eisenhauer; Karsten Schwan; Matthew Wolf; Xuechen Zhang; Hasan Abbasi; Scott Klasky; Norbert Podhorszki

As high-end systems move toward exascale sizes, a new model of scientific inquiry being developed is one in which online data analytics run concurrently with the high end simulations producing data outputs. Goals are to gain rapid insights into the ongoing scientific processes, assess their scientific validity, and/or initiate corrective or supplementary actions by launching additional computations when needed. The Flexpath system presented in this paper addresses the fundamental problem of how to structure and efficiently implement the communications between high end simulations and concurrently running online data analytics, the latter comprised of componentized dynamic services and service pipelines. Using a type-based publish/subscribe approach, Flexpath encourages diversity by permitting analytics services to differ in their computational and scaling characteristics and even in their internal execution models. Flex path uses direct and MxN connections between interacting services to reduce data movements, to allow for runtime connectivity changes to accommodate component arrivals/departures, and to support the multiple underlying communication protocols used for analytics workflows in which simulation outputs are processed by analytics services residing on the same nodes where they are generated, on the same machine, and/or on attached or remote analytics engines. This paper describes the design and implementation of Flexpath, and evaluates it with two widely used scientific applications and their associated data analytics methods.

international parallel and distributed processing symposium | 2012

Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services

Xuechen Zhang; Kei Davis; Song Jiang

A parallel system relies on both process scheduling and I/O scheduling for efficient use of resources, and a programs performance hinges on the resource on which it is bottlenecked. Existing process schedulers and I/O schedulers are independent. However, when the bottleneck is I/O, there is an opportunity to alleviate it via cooperation between the I/O and process schedulers: the service efficiency of I/O requests can be highly dependent on their issuance order, which in turn is heavily influenced by process scheduling. We propose a data-driven program execution mode in which process scheduling and request issuance are coordinated to facilitate effective I/O scheduling for high disk efficiency. Our implementation, Dual Par, uses process suspension and resumption, as well as pre-execution and prefetching techniques, to provide a pool of pre-sorted requests to the I/O scheduler. This data-driven execution mode is enabled when I/O is detected to be the bottleneck, otherwise the program runs in the normal computation-driven mode. Dual Par is implemented in the MPICH2 MPI-IO library for MPI programs to coordinate I/O service and process execution. Our experiments on a 120-node cluster using the PVFS2 file system show that Dual Par can increase system I/O throughput by 31% on average, compared to existing MPI-IO with or without using collective I/O.

ieee conference on mass storage systems and technologies | 2011

YouChoose: A performance interface enabling convenient and efficient QoS support for consolidated storage systems

Xuechen Zhang; Yuehai Xu; Song Jiang

Currently the QoS requirements for disk-based storage systems are usually presented in the form of service-level agreement (SLA) to bound I/O measures such as latency and throughput of I/O requests. However, SLA is not an effective performance interface for users to specify their required I/O service quality for two major reasons. First, for users, it is difficult to determine appropriate latency and throughput bounds to ensure their application performance without resource over-provisioning. Second, for storage system administrators, it is a challenge to estimate a users real resource demand because the specified SLA measures are not consistently correlated with the users resource demand. This makes resource provisioning and scheduling less informative and could greatly reduce system efficiency. We propose the concept of reference storage system (RSS), which can be a storage system chosen by users and whose performance can be measured off-line and mimicked on-line, as a performance interface between applications and storage servers. By designating an RSS to represent I/O performance requirement, a user can expect the performance received from a shared storage server servicing his I/O workload is not worse than the performance received from the RSS servicing the same workload. The storage system is responsible for implementing the RSS interface. The key enabling techniques are a machine learning model that derives request-specific performance requirements and an RSS-centric scheduling that efficiently allocates resource among requests from different users. The proposed scheme, named as YouChoose, supports the user-chosen performance interface through efficiently implementing and migrating virtual storage devices in a host storage system. Our evaluation based on trace-driven simulations shows that YouChoose can precisely implement the RSS performance interface, achieve a strong performance assurance and isolation, and improve the efficiency of a consolidated storage system consisting of different types of storage devices.

symposium on cloud computing | 2015

Understanding issue correlations: a case study of the Hadoop system

Jian Huang; Xuechen Zhang; Karsten Schwan

Over the last decade, Hadoop has evolved into a widely used platform for Big Data applications. Acknowledging its wide-spread use, we present a comprehensive analysis of the solved issues with applied patches in the Hadoop ecosystem. The analysis is conducted with a focus on Hadoops two essential components: HDFS (storage) and MapReduce (computation), it involves a total of 4218 solved issues over the last six years, covering 2180 issues from HDFS and 2038 issues from MapReduce. Insights derived from the study concern system design and development, particularly with respect to correlated issues and correlations between root causes of issues and characteristics of the Hadoop subsystems. These findings shed light on the future development of Big Data systems, on their testing, and on bug-finding tools.

international symposium on performance analysis of systems and software | 2013

Synergistic coupling of SSD and hard disk for QoS-aware virtual memory

Ke Liu; Xuechen Zhang; Kei Davis; Song Jiang

With significant advantages in capacity, power consumption, and price, solid state disk (SSD) has good potential to be employed as an extension of DRAM (memory), such that applications with large working sets could run efficiently on a modestly configured system. While initial results reported in recent works show promising prospects for this use of SSD by incorporating it into the management of virtual memory, frequent writes from write-intensive programs could quickly wear out SSD, making the idea less practical. We propose a scheme, HybridSwap, that integrates a hard disk with an SSD for virtual memory management, synergistically achieving the advantages of both. In addition, HybridSwap can constrain performance loss caused by swapping according to user-specified QoS requirements. To minimize writes to the SSD without undue performance loss, HybridSwap sequentially swaps a set of pages of virtual memory to the hard disk if they are expected to be read together. Using a history of page access patterns HybridSwap dynamically creates an out-of-memory virtual memory page layout on the swap space spanning the SSD and hard disk such that random reads are served by SSD and sequential reads are asynchronously served by the hard disk with high efficiency. In practice HybridSwap can effectively exploit the aggregate bandwidth of the two devices to accelerate page swapping. We have implemented HybridSwap in a recent Linux kernel, version 2.6.35.7. Our evaluation with representative benchmarks, such as Memcached for key-value store, and scientific programs from the ALGLIB cross-platform numerical analysis and data processing library, shows that the number of writes to SSD can be reduced by 40% with the systems performance comparable to that with pure SSD swapping, and can satisfy a swapping-related QoS requirement as long as

international parallel and distributed processing symposium | 2014

Scibox: Online Sharing of Scientific Data via the Cloud

Jian Huang; Xuechen Zhang; Greg Eisenhauer; Karsten Schwan; Matthew Wolf; Stephane Ethier; Scott Klasky

Collaborative science demands global sharing of scientific data. But it cannot leverage universally accessible cloud-based infrastructures like Drop Box, as those offer limited interfaces and inadequate levels of access bandwidth. We present the Scibox cloud facility for online sharing scientific data. It uses standard cloud storage solutions, but offers a usage model in which high end codes can write/read data to/from the cloud via the APIs they already use for their I/O actions. With Scibox, data upload/download volumes are controlled via Data Reduction-functions stated by end users and applied at the data source, before data is moved, with further gains in efficiency obtained by combining DR-functions to move exactly what is needed by current data consumers. We evaluate Scibox with science applications and their representative data analytics - the GTS fusion and the combustion image processing - demonstrating the potential for ubiquitous data access with substantial reductions in network traffic.

IEEE Transactions on Computers | 2010

Improving Networked File System Performance Using a Locality-Aware Cooperative Cache Protocol

Song Jiang; Xuechen Zhang; Shuang Liang; Kei Davis

In a distributed environment, the utilization of file buffer caches in different clients may greatly vary. Cooperative caching has been proposed to increase cache utilization by coordinating the shared usage of distributed caches. It allows clients that would more greatly benefit from larger caches to forward data objects to peer clients with relatively underutilized caches. To support such coordination, global cache utilization must be dynamically evaluated. This, in turn, requires an effective analysis of application data access patterns. Existing coordination protocols are demonstrably suboptimal in this respect, exhibiting inefficient memory utilization and undue interference among clients. We propose a locality-aware cooperative caching protocol, called LAC, that is based on analysis and manipulation of data block reuse distance to effectively predict cache utilization and the probability of data reuse at each client. Using a dynamically adaptive synchronization technique, we keep local information up to date and consistently comparable across clients. The system is highly scalable in the sense that global coordination is achieved without centralized control. We have conducted thorough trace-driven simulation experiments to assess the performance differences between LAC and various existing protocols representative of the general class. Using a realistic and representative cost model, we show that the LAC protocol significantly and consistently outperforms existing cooperative caching protocols, demonstrating high and balanced utilization of caches across all clients. In our experiments, LAC reduces block access time by up to 36 percent, with an average of 31 percent, over the system without peer cache coordination, and reduces block access time by up to 22 percent, with an average of 13 percent, over the best performer of the existing protocols.

Explore More